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Introduction 



1.1 Motivation 

Almost any technical or economic decision problem includes some degree of 
uncertainty about the values to assign to some problem-specific parameters. 
The best decision strategies with respect to some objective criterion must be 
found on the basis of the a priori information about these uncertainties. If 
it is possible to assign a probability distribution to the random parameters, 
the determination of an optimal decision strategy gives rise to a stochastic 
optimization model, also referred to as a stochastic program. However, the 
solution of stochastic programs poses severe difficulties, especially in the mul- 
tistage case. If the underlying probability space is continuous, the stochas- 
tic program represents an optimization problem over an infinite-dimensional 
function space. Then, analytical solutions are hardly available, and nontrivial 
problems of practical relevance must always be solved numerically. However, 
numerical solution requires discretization of the continuous probability space. 
One should select a discrete probability measure with finite support and solve 
the stochastic program with respect to this discrete auxiliary measure, instead 
of the continuous original measure. In doing so, one effectively approximates 
the original stochastic program by an optimization problem over a finite- 
dimensional Euclidean space, which is numerically tractable. 

The selection of an appropriate discrete probability measure is referred 
to as scenario generation and represents a primary challenge in the field of 
stochastic programming. It is indispensable that the solution of the approx- 
imate problem can be related in some way to the solution of the original 
problem, i.e. the exact solution of the auxiliary problem should provide an 
approximate solution of the original stochastic program. Ideally, one can find 
a discrete probability measure such that the optimal value and the optimizer 
set of the associated auxiliary stochastic program are, in a quantitative sense, 
close to the optimal value and the optimizer set of the original optimization 
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problem, respectively. However, there is always a tradeoff between accuracy, 
requiring as many discretization points as possible, and computational effort, 
which increases dramatically with the number of discretization points. 

In this work we will always assume that the stochastic program under con- 
sideration has several decision stages. For the sake of a suggestive terminology, 
we will moreover assume that the objective criterion is to maximize expected 
profit. At each decision stage a realization of some stochastic parameter is 
observed in response to which a decision is selected. Typically, the decision is 
chosen so as to maximize the sum of a certain immediate profit and the con- 
ditional expectation of an uncertain future profit. In addition, the decision is 
subject to specific constraints, which may depend on the random parameters 
observed by that time. 

The recourse function (profit-to-go function) of a specific stage is defined as 
the expected future profit conditional on the observations up to that stage and 
the decisions selected in the previous stages. As in dynamic programming, the 
recourse functions can be calculated recursively by iterative application of the 
operations ‘maximization’ and ‘integration’ with respect to some conditional 
probability measure. Knowledge of the structural properties of the recourse 
functions (measurability, convexity, concavity, subdifferentiability, Lipschitz 
continuity, etc.) is of fundamental importance for efficient scenario generation. 



1.2 Previous Research 

Let us briefly summarize some scenario generation methods which have re- 
ceived considerable attention in the stochastic programming literature. We 
primarily focus on bounding methods and optimal discretization by using 
probability metrics. Other approaches (e.g. conditional sampling, moment 
matching, path-based methods, etc.) are of minor importance for our pur- 
poses and will therefore be omitted in the subsequent discussion. A survey of 
these methods is provided in [62]. 

First, we address the class of bounding methods, which enjoy growing 
popularity since the 1960s. Thereby, one attempts to construct two discrete 
probability measures such that the optimal values of the associated approx- 
imate problems represent upper and lower bounds for the optimal value of 
the original stochastic program, respectively. In developing bounding prob- 
ability measures, one typically exploits structural properties of the recourse 
functions. When the recourse functions of all decision stages are convex in 
the random parameters, a lower bounding measure can be found by means 
of Jensen’s inequality [57], which is based on utilizing only first moment in- 
formation. Notice that Jensen’s inequality applies under fairly general condi- 
tions, e.g. for multivariate random vectors with correlated components and 
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an unbounded support. If, however, the support of the random variables rep- 
resents a compact parallelepiped, a first order upper bound is provided via 
the Edmundson-Madansky inequality; Edmundson [35] concentrates on the 
univariate case, whereas Madansky [69,70] and Frauendorfer [41] consider the 
multivariate setting, given that the components of the random vector are in- 
dependent and dependent, respectively. In the dependent case, not only the 
mean but also the first order cross moments of the underlying random vector 
are used. Gassmann and Ziemba [47] generalize the Edmundson-Madansky 
bound to dependent random variables defined on polyhedral sets, while Birge 
and Wets [11, 12] allow for unbounded domains. 

When the difference between upper and lower bounds is larger than some 
prescribed tolerance, one can try to incorporate higher order information 
in the construction of bounding measures. Dokov and Morton [23] develop 
lower bounds based on second order information, while Birge and Dula [9], 
Dupacova [30], and Kail [59] propose second order upper bounds. Higher or- 
der upper bounds are provided in [24], All of the bounds mentioned so far 
can be improved to an arbitrary degree of precision by applying them on suf- 
ficiently small subsets of the underlying domain. This technique - known as 
partitioning - is exemplified in [11,45,56]. 

If the recourse functions are concave, Jensen’s inequality and its gener- 
alizations yield upper bounds, whereas the Edmundson-Madansky-type in- 
equalities furnish lower bounds. This is a mere consequence of the fact that 
multiplication by —1 maps any convex function to a concave function and vice 
versa. 

The above bounds can be generalized to the case when the recourse func- 
tions are convex-concave saddle functions. First order bounds similar to those 
of Jensen and Edmundson-Madansky are developed by Frauendorfer [42] un- 
der the assumption that the underlying random vector is defined on a mul- 
tidimensional rectangle or a cross-simplex (i.e. the Cartesian product of two 
simplices). Edirisinghe and Ziemba [33,34] generalize these bounds to ran- 
dom variables on an arbitrary polyhedral probability space. Moreover, Ediri- 
singhe [31] constructs bounding measures based on second order information; 
see also the similar approach presented in [111]. Suitable partitioning schemes 
for bounds on saddle functions are proposed in [32,42]. 

Notice that several of the bounding measures presented above may be 
derived as solutions of generalized moment problems [12,33,34,42,58]. This 
interpretation of bounds for stochastic programs originates from the work 
of Dupacova in a game-theoretic setting [30] and also has applications in 
models where the underlying probability distribution is only known in limited 
manner [25-27]. 

The classical bounding methods for scenario generation fail unless the 
recourse functions of a given stochastic program can be shown to be convex, 
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concave, or saddle-shaped in the stochastic parameters. In order to solve more 
general optimization problems, one should relax this sensible requirement. For 
instance, if the recourse functions exhibit some local Lipschitzian properties, 
one can apply scenario generation by optimal discretization due to Pflug [85] . 
This approach aims at finding a discrete probability measure with a prescribed 
number of mass points which is close to the original measure with respect 
to some probability metric. Then, one can prove that the optimal value of 
the original problem differs from the optimal value of the discrete auxiliary 
problem at most by the distance of the two measures, with respect to the given 
probability metric, multiplied by an appropriate Lipschitz constant. Rachev 
and Romisch [87] use a similar method to optimally reduce the number of mass 
points of a discrete probability measure obtained via sampling; see also [29]. 
This scenario reduction technique can occasionally decrease computational 
complexity of a stochastic program without seriously affecting its performance. 



1.3 Objective 

From the point of view of applications, most of the scenario generation meth- 
ods presented in the previous section suffer from more or less serious defi- 
ciencies. For example, use of the classical bounding methods is restricted to 
problems with either convex, concave, or - more generally - saddle-shaped re- 
course functions. Loosely speaking, the following standard assumptions suffice 
to imply the saddle property of the recourse functions (see Chap. 3): 

(a) the stochastic program represents a convex optimization problem (with- 
out loss of generality we focus on maximization problems); 

(b) the immediate profit in each stage represents a convex function of the 
stochastic parameters; 

(c) the constraint functions are jointly convex in the decision variables and 
the stochastic parameters; 

(d) the stochastic parameters follow autoregressive processes driven by seri- 
ally uncorrelated noise. 

Although the first requirement seems to be fairly restrictive, many impor- 
tant real-life decision problems can be formulated as convex stochastic pro- 
grams. Conversely, the conditions (b)-(d) are restrictive, especially in the 
multistage case. For example, stochastic programs involving lognormally dis- 
tributed prices and demands, derivative trading, or risk-aversion do not meet 
the requirements (b), (c), and (d) jointly. While being practically relevant, 
such problems go beyond the scope of the classical bounding methods. 
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Notice that the modern methods based on probability metrics apply to 
broader classes of decision problems. For a brief outline of the limitations of 
these approaches we refer to [62, Sect. 4]. 

The present work concentrates on bounding methods. In particular, we 
will use the barycentric approximation scheme by Frauendorfer [42-44] as a 
starting point. This method is adapted for problems which satisfy the condi- 
tions (a)-(d) and provides two complementary discrete probability measures. 
Solution of a stochastic program with respect to these barycentric measures 
yields lower and upper bounds for the true optimal value, respectively. By 
successively partitioning the domain of the random vectors, the bounds can 
be made arbitrarily tight. 

In this book we interpret the barycentric discretizations and the associated 
auxiliary stochastic programs in a more general setting. Concretely speaking, 
we drop the restrictive assumptions (b) and (c), thus investigating convex 
multistage stochastic programs with a generalized nonconvex dependence on 
the random variables. Then, the optima of the discretized auxiliary problems 
can no longer be shown to represent strict bounds on the true optimal value. 
However, we will argue that they still provide bounds after a simple trans- 
formation. The main goal of the present work is to characterize this trans- 
formation in an intuitive though mathematically rigorous way. In this regard, 
we will formulate weak regularity conditions under which the new bounding 
method is applicable. Thereby, we identify broad problem classes for which 
easily computable bounds are available. Moreover, it will be shown that the 
new bounds can still be made tight via partitioning. 

As a second major objective, we present a collection of important real-life 
decision problems in the field of power management that can be addressed 
with our new approach. These problems go beyond the scope of the classical 
bounding methods. Their formulation as multistage stochastic programs re- 
quires specific modelling techniques, which will be investigated systematically. 
Numerical calculations are provided for illustrative purposes. 



1.4 Outline 

This book is structured as follows. In Chap. 2 we briefly review the basic 
theory of stochastic optimization. After some introductory remarks about the 
mathematical representation of uncertainty, we give precise definitions of de- 
cision strategies, constraints, and objective functions. Subsequently, the static 
and dynamic versions of a general non-linear stochastic program are formu- 
lated, and a set of fundamental regularity conditions is introduced. Under 
these regularity conditions, the static and dynamic versions of a stochastic 
program can be shown to be well-defined, solvable, and equivalent. At the 




6 



1 Introduction 



end of the chapter we will sketch a few approaches to evaluate the benefit of 
using a stochastic optimization model. 

Chapter 3 specializes the basic theory of stochastic optimization to the 
important class of convex stochastic programs. Here, we work with a set of 
stronger regularity conditions which provide sharper results than in Chap. 2. 
Given these stronger conditions, the recourse functions of any convex stochas- 
tic program are subdifferentiable and exhibit a characteristic saddle structure. 
The presented theorems slightly generalize some well-known results. 

Chapter 4 deals with scenario generation based on classical barycentric ap- 
proximation. We start by studying generalized moment problems whose dual 
solutions represent extremal probability measures. These extremal measures 
can be used to synthesize two discrete bounding measures for stochastic pro- 
grams which satisfy the regularity conditions of Chap. 3. Next, we recover 
the main result of [43], i.e. the auxiliary stochastic programs associated with 
the discrete barycentric measures provide bounds on the true optimal value. 
Moreover, we propose a slightly generalized partitioning method to improve 
the bounds. In conclusion, we introduce a sequence of bounding sets for the 
optimal first-stage decisions. This sequence is shown to converge to the max- 
imizer set of the underlying stochastic program. Our analysis sharpens the 
epi-convergence results of Birge and Wets [11]. 

In Chap. 5 we relax the regularity conditions of Chap. 3 to allow for convex 
stochastic programs with a nonconvex dependence on the stochastic parame- 
ters. Under weak assumptions, such ‘irregular’ problems can be transformed 
to ‘regular’ problems, whose recourse functions are subdifferentiable and ex- 
hibit a characteristic saddle structure. The classical barycentric bounds for 
the regular problems can then be back-transformed to yield bounds on the 
corresponding irregular problems. In the case of linear stochastic programs, 
we derive a particularly intuitive formula for the new bounds. 

Chapter 6 presents exemplary real-life applications of the theoretical con- 
cepts developed in Chap. 5. Concretely speaking, it will be shown how market 
power, lognormal stochastic processes, and risk-aversion can be properly han- 
dled in a stochastic programming framework. After a detailed exposition of 
appropriate modelling techniques, we report on numerical experience with the 
new bounding method. Finally, Chap. 7 concludes. 
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In this chapter we develop the basic theory of stochastic optimization. Af- 
ter some introductory remarks about the mathematical representation of un- 
certainty, we investigate the key ingredients of general stochastic programs, 
i.e. decision strategies, constraints, and objective functions. Subsequently, the 
static and dynamic versions of a stochastic optimization problem are formu- 
lated, and some elementary regularity conditions are discussed. Under these 
regularity conditions, the static and dynamic versions of the stochastic pro- 
gram at hand can be shown to be well-defined, solvable, and equivalent. Fi- 
nally, we discuss two useful indicators, which enjoy wide popularity in liter- 
ature: the expected value of perfect information (EVPI) represents the maxi- 
mum amount to be paid in return for complete and accurate information about 
the future, whereas the value of the stochastic solution (VSS) quantifies the 
cost of ignoring uncertainty in choosing a decision. 



2.1 Modelling Uncertainty 

Discrete-time multistage stochastic programs are built on a probability space 
(f2,F,P) equipped with & filtration, i.e. an increasing sequence of cr-algebras 
included in T . Thus, we have F CF for all s < t lying within 
the finite index set r := {0, . . . ,T}. Sometimes, we will also need the related 
index sets r_ ( := r\{t}, t G r. The parameter t is normally interpreted as a 
time index, and T usually indices the last decision stage of a given stochastic 
program. Without loss of generality we may postulate T 1 ' = T. In case of a 
finite probability space, it is usually assumed that the u-algebra T coincides 
with the power set 2°. The measurable space (12, JF) is called sample space ; 
elements of f? are called outcomes, and elements of T are referred to as events. 
Furthermore, the u-algebra JF* can conveniently be seen as the information 
available at time t or as the a -algebra of events up to time t. 
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A random, variable or a random vector is defined as a measurable function 
d> : (D,p —> {Q, B(Q)), where 17 denotes the state space comprising the 
so-called observations. In the context of stochastic programming, 17 is usually 
taken to be a compact subset of M M , and B(Q) denotes the Borel field 1 of the 
state space. Furthermore, a stochastic process is understood to be a family of 
random variables {u> t } 4€r , <D t : (Q,P) — > (l7 t , B{Q t )). Again, Sl t is supposed 
to be a compact subset of M m< . By convention, a stochastic process is called 
IF 1 -adapted or P -previsible if cD t is measurable with respect to P or P~ l , 
respectively. In many applications, u>o is measurable with respect to the trivial 
cr-algebra {0, 17}. This implies roughly that there is no information at time 0. 
By definition, a sequence of {0, 17}-measurable random variables represents a 
deterministic process. 

A filtration {P} teT is said to be induced by a process {u) 4 } 4£t if P co- 
incides with the <7-field generated by the sets U* =0 {ci>7 1 (A)|A E S(l7 s )}. The 
induced cr-algebra P describes the information which is available at time t 
by only observing the underlying process. Figure 2.1 shows an exemplary dis- 
crete stochastic process {tu t } ter with four time steps. Every different path 
of the corresponding scenario tree, i.e. every possible sequence of observa- 
tions, is assigned to one element of the sample space, which is chosen to be 
17 = {u>i, . . . ,a>s}. The atoms of the induced cr-field P can be determined 
by forming the union of all paths which are indistinguishable up to time t. In 
Fig. 2.2, the samples belonging to an atom of P are linked with a shaded line. 
By construction, all atoms of stage t + 1 are subsets of the atoms of stage t, im- 
plying that P C P +1 . This sequential inclusion reflects the idea that events 
are never ‘forgotten’, i.e. a filtration basically determines how information is 
revealed through time. 




Fig. 2.1. Example of a stochastic process {wt }t Sr 



1 The Borel field B(Q) is the smallest u-algebra containing all relatively open 
subsets of Q with respect to the standard topology of R M . 
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For later use we define a random variable u> corresponding to the stochastic 
process {d> t } ter . 

u> := Cj o x • • • x u> T : l]->f]oX"’Xfi]’=:l] (2.1) 

Every element of the state space 12 can be identified with an equivalence 
class of outcomes in the sample space. Thus, (12, P) naturally inherits the 
probability measure defined on (12, P) through P(A) := P(u>~ 1 (A)), A E 
T : = 0(12). By construction, P is a regular probability measure on (12, JF). 
Moreover, the state space is equipped with a filtration {P} t €r, which is given 
through 



p := {A x Q t+ i x • • • x 1? T | A e B(l?o x • • • x 12*)}. 

Instead of the abstract probability space (12, T P), we may equivalently con- 
sider the induced probability space (12, JF, P), which we will use henceforth. 
With this convention, u> reduces to the identity map, while u> t becomes a 
specific coordinate projection for every t G r. Moreover, the terms ‘outcome’ 
and ‘observation’ will from now on be used synonymously. From a conceptual 
point of view, it is important to distinguish random variables t o t and their 
realizations, which will be denoted by uq below. 

By convention, E(-) denotes expectation over the probability measure P. 
Conditional expectations E t ( •) := E(-\P) on (12, J 7 , P) are defined up to an 
equivalence relation; i.e. there can be many versions of £*(•), which differ on 
P-null sets. In this work, E t (-) is taken to be a regular conditional expectation 
being representable as an indefinite integral with respect to a regular condi- 
tional probability. Such regular conditional probabilities exist since T is the 
Borel field on 12 and P is a regular Borel measure [66, Sect. 27]. 
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2.2 Policies 

In the sequel we assume {ojtjter to characterize some uncertain problem data 
with respect to future time periods. In the context of power management, 
for instance, electricity prices, load demand, reservoir inflows, and fuel prices 
are major uncertain impacts contained in u> t . Information on the random data 
becomes available successively at finitely many time points, at which decisions 
are selected. After a first observation ujq, an initial decision xq is taken. At this 
stage, the decision maker has no information about future outcomes. Then, 
a second observation tui is made, in response to which a subsequent decision 
is selected, etc. Generally speaking, after the observation of the outcomes 
(u> 0 , . . . ,ajt), the decision maker selects some actions x t £ R"* according 
to a specific decision rule. 2 Such a decision rule depends on the nature of 
the underlying problem. Normally, rational decision makers choose actions 
maximizing some objective function. In the remainder of this section we aim 
at formalizing the concept of decision rules in a slightly more general setting, 
allowing also for so-called anticipative policies. By definition, an anticipative 
decision rule is a sequence of essentially bounded Borel measurable functions 
{■Ei}t€T, i.e. 

x t £ £~ := £°°(f2, P, P; R nt ). (2.2) 

Obviously, such a decision rule assigns a well-defined action vector to every 
time stage and every possible outcome. With the definitions of Sect. 2.1, any 
decision rule can be interpreted as a stochastic process 3 {x t }ter on (12, P, P). 
By convention, x t £ R n ‘ denotes a realization of x t . In order to simplify no- 
tation, we introduce the combined random variable G: 1 (u>o, • • ■ > £>t); its 

realizations u. (u>o, ... ,uj t ) 6 f?* := x‘ =0 f2 t describe the sequence of 
observations or the outcome history up to time t (u£ £ f2* C R M , where 
M l := Mq + ■ ■ ■ + M t ). Notice that CJ T coincides with the random vec- 
tor u> defined in (2.1); a similar identity holds for the underlying domain, 
i.e. we have f2 T = 12, which is embedded in a Euclidean space of dimension 
M := M t . Next, we adopt an analogous notation for the actions. By defini- 
tion, x l (xo, . . . ,Xt) represents a collection of measurable functions and 
x l := (cco, . . . , Xt) denotes the decision history up to time t (afi £ R" , where 

n t '.= n o H b n t ). In particular, we introduce the policy function x := x T , 

which constitutes an n-dimensional Borel measurable mapping ( n := n T ) and 
completely determines the underlying decision rule. 

Let us now characterize non- anticipative decision rules. The natural 
causality structure described above, i.e. the requirement that future events 

2 A decision rule is also referred to as a policy or a strategy. 

3 Notice that the composed mapping x t ou> : Q — > R nt is ^-measurable, since x t is 
assumed to be Borel measurable. This implies that itou is a random variable on the 
sample space Q. Remember, in contrast, that compositions of Lebesgue measurable 
functions are generally not Lebesgue measurable. 
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may not influence present decisions, leads to a special functional dependence 
of the policy function upon the uncertain parameters. Formally speaking, Xt 
must be constant as a function of oj t+ i, . . . ,u> T for every ter. The space of 
non-anticipative decision rules is thus defined as (see e.g. [96]) 

J\f n — -^no X X X J^riT i 

where 

Af nt :=£°°(fl,F 1 ,P;R n ‘) Vi e r. 

Obviously, M n is a linear subspace of £“ = £“ x • • • x C™ . With a slight 
abuse of notation, 4 non-anticipative policies can be represented as 

x(uj) = (i 0 (w 0 ),*i(LL) 1 ),...,i t (a; t ),...,i T (w T )) . (2.3) 

Notice that any non-anticipative decision process {xtfter is adapted to the 
filtration induced by the random variables {u> t } ter . Thus, the predicates ‘non- 
anticipative’ and ‘^-adapted’ are equivalent characterizations of decision pro- 
cesses. 

In a dynamic setting, the decisions x t met at time t not only depend on 
the past outcomes ui 1 but also on the earlier decisions x t_1 . In turn, these 
previous decisions depend on (uF -1 ,x t ~ 2 ). Tracking the interdependence of 
decisions and outcomes inductively until the first stage, one can easily verify 
that (2.3) correctly reflects the net- dependence on the stochastic parameters. 



2.3 Constraints 

In broad problem classes the admissible actions at time t are subject to restric- 
tions, which may depend on earlier decisions and on observations up to time t. 
Such restrictions are conveniently captured via vector- valued constraint func- 
tions. To each decision stage ter we assign three Borel measurable constraint 
functions 



fT ■ R"‘ 


x R m ‘ - 


-> R r ‘" , 


fP : R"‘ 


x R m ‘ - 


R r ‘ q 


ft = 


x R m ‘ - 


R r ‘. 



Thereby, it is assumed that characterizes ‘pure’ inequality constraints, 
whereas /® q determines possible equality constraints. By definition, the map- 
ping f t (/“, fP , —fP) accounts for both types of constraints. Then, 

4 There is a natural injection from the essentially bounded functions on ft t to 
the essentially bounded functions on ft. 
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consistency requires the dimensions to match up properly, i.e. r t := rj n + 2r® q 
for all t € r. The sets of feasible first stage decisions are given through 

X 0 {ojq) ■= {xo e R”° | /„(:ro,tuo) < 0}, (2.4a) 

whereas the feasible sets of the recourse decisions are defined as 

X t (cc <_1 ,o; t ) := {x t £ M”* | < 0} for t € r_ 0 - (2.4b) 

The above specifications allow for a uniform treatment of equality and in- 
equality constraints, since any equation of the form /® q = 0 is equivalent to 
two opposing inequalities /® q < 0 and — /® q < 0. Notice that the feasible set 
mappings 

wo h lo(wo) and (a:* -1 ,^*) >— > X t (x t ~ 1 ,u} t ) for t G r_o 

can be viewed as set-valued functions or multifunctions. For any ter the 
nested feasible set mapping X 1 is defined through 

x t . f rt* 2 R "‘ 

[x* e R”‘ I € Xo(cuo), ...,x t e 

By construction, the values of X 1 are closed subsets of R n< if the constraint 
functions are lower semicontinuous in the decision variables. Moreover, X 1 
constitutes an J^-measurable multifunction if the constraint functions are 
normal integrands in the sense of [91, theorem 2 J] . The set- valued mapping 
X := X T is referred to as constraint multifunction in literature, see e.g. Rock- 
afellar and Wets [92]. In case of a well-defined decision problem, the mul- 
tifunction X t is non-empty-valued for any thinkable outcome and decision 
history (t £ r). Formally speaking, for any possible sequence of outcomes 
w* £ Q* and for any admissible decision history x l ~ l € X t-1 (tw t_1 ) we re- 
quire X t (a ^ 0. As easily can be verified, this characterization of 
well-definedness is equivalent to the condition 

{^er'lse^wJjEiV) Vo lefi.ter. (2.5) 

Equation (2.5) implies that the projection of X(u>) on the action space of 
the first t stages only depends on oA. Therefore, a constraint multifunction 
which satisfies condition (2.5) is called non-anticipative [92], In particular, this 
condition implies X to be non-empty-valued on the entire probability space 
Q if the first stage feasible set Ao(u;o) is non-empty for all wo € l?o- If (2-5) 
fails to be true, all decisions that lead to infeasibilities in certain scenarios of 
future time stages must a priori be excluded by so-called induced constraints. 
Notice that every constraint multifunction can be made non-anticipative by 
introducing explicitly the induced constraints, as pointed out in [94,107]. In 
this work we shall confine ourselves to non-anticipative decision problems since 
induced constraints are acausal and should be avoided in realistic models. 
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One might be tempted to relax condition (2.5) to hold only on the support 
of the probability measure P instead of the entire space 17 (note that supp P 
can be a strict subset of 17). However, the scenario generation method to be 
developed in Sect. 4 relies on the assumption that the constraint multifunction 
satisfies condition (2.5) for specific scenarios in l7\supp P. As a matter of 
fact, for most problems of practical interest this extra requirement is no real 
restriction. 




Fig. 2.3. y f is the graph of the multifunction X t over Z* and lies in the level set 
lev<o ft- The image of Z t under Xt is given by the projection of Y* on R n ‘ 



When dealing with constrained multistage stochastic programs, it is some- 
times useful to introduce a sequence of generalized feasible sets Y l comprising 
the joint outcome and decision histories feasible up to time t. 

Y* := {{x t ,u t )\u t G x* e X*(u>*)} Vi € r (2.6) 

For the sake of transparent notation, we need the related generalized feasible 
sets Z* containing no stage t decisions. Define Z° := 1 7o and 

Z t :={(x t ~ 1 ,u; t )\v t €n t ,x t - 1 <EX t - 1 {u t - 1 )} Vi € r_ 0 - (2.7) 

Notice that Y t can be considered as the graph of the multifunction X t over 17* 
or as the graph of X t over Z f , as sketched in Fig. 2.3. Moreover, as indicated 
in the illustration, Y t lies within the level set lev<o f t , where 

lev< 0 f t {(^W) | f t (x t , u/) < 0}. 

The generalized feasible sets T 4 and Z l are completely determined by the con- 
straint functions, and their importance will become clear in the next sections. 
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2.4 Static and Dynamic Version of a Stochastic Program 



In stochastic programming it is usually assumed that decisions are selected 
in order to maximize the expected value of some time-separable objective 
function. The intertemporal contributions to the objective, {pt}ter, depend on 
the outcome and decision history and can be interpreted as a sequence of profit 
functions. As a minimal requirement, the mappings p t : R" x K M — > 1 (t E r) 
must be Borel measurable. Collecting the above definitions, the static version 
of a general (non-linear) multistage stochastic program can be formulated as 
follows: 

r t 

sup f y ]fh(& t (u t ),u) t ) 

x&M n Jn L t=0 

s.t. f t (x t (aj t ),u> t )<0 P-a.s. t £ r. 



dP(u) 



(2.8) 



For theoretical considerations it is sometimes useful to bring the explicit re- 
strictions to the objective function. To this end we introduce the effective 
profit functions {p t } tSr , which are defined as extended-real-valued mappings: 

Pt ( **,<*>*) := 



Pt(x *,(*>*) for ftixfiu 1 ) < 0, 
—oo else. 



(2.9) 



With (2.9) the stochastic program (2.8) reduces to 



sup 
®€A r n 



m 






t = 0 



dP{u). 



( 2 . 10 ) 



Notice that (2.10) and (2.8) are indeed equivalent since the explicit constraints 
in (2.8) are assumed to hold almost surely. On the other hand, allowing for 
policy functions which violate the restrictions on P-null sets has no undesir- 
able effect on the optimal solution. In fact, after a suitable redefinition on a 
set of measure zero, any feasible policy function strictly satisfies the explicit 
restrictions, and any such redefinition leaves the objective function value un- 
changed. 

Sometimes it is more comfortable to work with the dynamic version of a 
given multistage stochastic program, which is defined recursively. The para- 
metric family of stage T subproblems is given through 

<Pt(x t ~ 1 ,u> T ) := sup Pt(x t ,oj t ), (2.11a) 

X T £R n T 



whereas the subordinate maximization problems for t = T — 1, . . . , 1 depend 
on the optimal value functions of the subsequent stages, respectively. 

# t (£C t-1 ,cy) := sup p t (x t ,oj t )+ [ $ t+1 (x t ,u; t+1 )dP t+1 (u} t+1 \u t ) (2.11b) 
astern 1 ** Jfh+i 
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The optimal value function of the subordinate zero stage problem only de- 
pends on the random variable £>o, which is usually assumed deterministic. 

$ 0 (^ 0 ):= sup p 0 (x 0y u 0 )+ $ 1 (x 0 ,u> 0 )dP 1 (u> 1 \v 0 ) (2.11c) 

ioet" 0 Jn 1 

Most importantly, the optimal value (EG? 0 ) := f &odPo of the dynamic pro- 
gram (2.11) reduces to the unconditional expectation of the value function 
<?o- If wo is deterministic, the calculation of (E<P 0 ) reduces to the evaluation 
of at one single point. 

In the above problem formulation, P t denotes the regular conditional prob- 
ability distribution of given m t_1 (for t G r_ 0 ), and Pq stands for the 
marginal distribution of u) o- Notice that the probability measure P 4 (-|u/ -1 ) 
does not depend on the decision history cc t_1 , implying that the uncertain 
impacts are completely exogenous and can not be influenced by the deci- 
sion maker. It should be remarked that the maximization in problem (2.8) is 
performed over all feasible policy functions x G A f n , whereas the recursive for- 
mulation (2.11) represents a large collection of simpler optimization problems 
over finite-dimensional Euclidean spaces. The optimal value function of stage t 
characterizes an extended-real- valued mapping : R" x R M — >■ [— 00 , 00 ], 
which is usually referred to as recourse function in literature [10,61]. Further- 
more, in order to simplify notation, we define the expectation functional 

(E t ^ t+ i){x\oJ t ) := f & t +i(x t ,u: t+1 )dPt+i(u}t+i\u> t ) (2.12) 

J O t +i 

as the expectation of the recourse function $t + 1 conditional on the information 
available at stage f, f 6 T-t- 

Before choosing an action at stage t, the decision maker knows the out- 
come and decision history ( x t_1 ,w t ) G Z 1 . Vectors in the complement of Z t 
correspond to sequences of impossible outcomes or forbidden actions. In prin- 
ciple, they need not be considered. Thus, only the restriction of $ t to Z* has 
physical meaning, and Z 1 shall be referred to as the natural domain of the 
recourse function d>t- Moreover, it is important to realize that changes of pt on 
the complement of Y* do not influence the restriction of <P S to Z s , s = 0, . . . , t. 
Therefore, Y t can conveniently be interpreted as the natural domain of the 
profit function p t and the expectation functional (E t <I> t+ i). 

It is intuitively appealing that the static and dynamic versions of a multi- 
stage stochastic program are equivalent in the sense that their optimal values 
coincide. However, without adequate regularity conditions it is not even clear 
whether the integrals in (2.11b) and (2.11c) are well-defined. Therefore, in a 
first step, we must care about the existence of (2.8) and (2.11). Once well- 
definedness is established, the static and dynamic versions can be shown to 
be equivalent and solvable. 
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For the precise formulation of appropriate regularity conditions we need 
the following conventions. 

Definition 2.1. We call {u) t }te T a nonlinear autoregressive process if for ev- 
ery index t £ r_ o there is a random vector it with induced probability space 
(£t,B(£t),Qt) and a transformation function Ht ■ fl 1-1 x £ t — > fl t such that 

Cj t = 

The initial data u>o and the disturbances {£ t } teT _ 0 are mutually independent, 
and for each t £ t_o we postulate: 

— £ t is a non-empty Borel subset of a finite- dimensional Euclidean space; 

— Ht is a Borel measurable map; 

— Qt is a regular probability measure on (£ t ,B(£ t )). 



Definition 2.1 is inspired by a similar concept in the field of nonlinear time 
series analysis [105, Chap. 3]. Note that if {J> t } teT follows a nonlinear autore- 
gressive process, then the conditional probability distribution P t is defined 
through 

Pt(A l^- 1 ) = Q t {{e t \HM-\et) 6 A}) £ B{fl t ), a/" 1 e fl*" 1 . 

Definition 2.2. A set-valued mapping X : fl C R M — ► 2®" is said to be 

(i) upper semicontinuous (use) at u> if for each open set U with X(u>) C U 
the set {w|X(u;) C U} is a neighborhood of Q; 

(ii) lower semicontinuous (Isc) at u> if for each open set U with X(u>)nU ^ 0 
the set {u>|X(tu) fl U ^ 0} is a neighborhood of u>; 

(Hi) continuous at u> if it is both use and Isc at u>; 

(iv) use (Isc, continuous) on fl if it is use (Isc, continuous) at u> for every 
U) £ fl; 

(v) bounded on fl if the image X(fl) is a bounded subset of R". 

Definition 2.2 is consistent with Border [13, definition 11.3]. In that reference, 
however, semicontinuity of multifunctions is referred to as ‘hemicontinuity’, in 
order to avoid confusion with semicontinuity of ordinary functions. For more 
details about continuity properties of multifunctions, we refer to the concise 
treatment by Rockafellar and Wets [97, Chap. 5]. 
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Returning to the stochastic program (2.8), we will impose the following 
regularity conditions: 



(Al) fit is compact and covers the support of u? t for all t £ t; 

(A2) the profit function p t is use and bounded on Y* for all iSr; 

(A3) the feasible set mapping X t is use and bounded on Z* for all t £ r; 
(A4) the random data {a> t } iSr follows a nonlinear autoregressive process; 
H t is a Caratheodory map , i.e. it is continuous in w t_1 and (Borel) 
measurable in £ ( , while £ t is compact for all t £ r_o; 

(A5) the multifunction X t is non-empty-closed-valued on Z l for all t £ r. 



The conditions (Al), (A2), (A3), and (A4) are unproblematic for many prac- 
tical applications (see also proposition 2.3 below). Assumption (A5) ensures 
non-anticipativity of the constraint multifunction. The following proposition 
suggests a simple characterization of closed- valued feasible set mappings which 
are use and bounded. 

Proposition 2.3. Given the condition (Al), assumption (AS) is equivalent 
to compactness of the generalized feasible sets {T t } t6T and {Z t } t £t . 

Proof: First, we prove necessity by induction on t. The basis step is trivial 
since Z° = J?o is compact as implied by assumption (Al). Next, assume that 
Z f is compact for some t £ t. Then, assumption (A3) postulates boundedness 
of XtiZ 1 ). Thus, the generalized feasible set 

Y l -graphX t | zt C Z t x X t (Z l ) 

is bounded. Moreover, as X t is use and closed- valued, we may invoke propo- 
sition 11.9 (a) of Border [13] to prove closedness of Y L . Boundedness and 
closedness entail compactness of Y l . If t < T, by assumption (Al) we may 
then conclude that Z t+1 = Y l x flt+i is compact, too. This observation com- 
pletes the induction step. 

Next, we prove sufficiency. By assumption, the graph of the multifunction 
X t over Z l is compact. Therefore, the image of X t over Z* is bounded, and 
X t is use due to [13, proposition 11.9 (b)]. Hence, condition (A3) follows. □ 

Corollary 2.4. Given condition (Al), assumption (AS) holds true if the con- 
straint function ft is continuous, and the set lev<o ft is bounded for all t £ r. 

Proposition 2.5. Under the assumptions (A1)-(A5) the recourse function d? t 
is use and bounded on Z l for all t £ t. A fortiori, the expectation functional 
(E t $t+ 1 ) is well-defined, finite, and use on Y t for t £ t_t, and (Ed? o) is 
finite. 
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Proof. Denote by K t and K t finite lower and upper bounds of p t on Y l . Such 
bounds exist by assumption (A2). Then, the claim is proved by backward 
induction with respect to t. By condition (A2), the effective profit function 
Pt is use and bounded on Y T . For {x T ~ 1 , tu T ) £ Z T the recourse function of 
stage T is given by 

,u> T ) = sup pt(x t ,u> T ) = sup p t(x t ,u> T ). 

XT& K" T ®r£X T (Z T ) 

Thus, by [97, theorem 1.17] the optimal value function <&t is use on Z T since 
Xt(Z t ) is compact. Notice that Xt(Z t ) is the projection of Y T on 
which is compact by proposition 2.3. Assumption (A5) and the definition of 
the recourse function <Pt directly yield the inequality k t < on Z T . 

Thus, the basis step is established. 

Assume next that the recourse function <&t + 1 is use and bounded on Z t+1 , 
i.e. assume Y^=t+i~s — &t + 1 < UpP er semicontinuity entails 

Borel measurability of the mapping cj t+ 1 i— > <P t+ i(x t , u> t+1 ). Consequently, 
the expectation functional (E t ^t+ i) is well-defined. Choosing an arbitrary 
convergent sequence {(xJ[,u4)}fc e N in Y l with a limiting point ( x t ,u> t ) £ Y 4 
we find 



limsup(.E t <P t+ i)04,u4) 

k— KX) 



= limsup / $ t +\{x\,u t+ i, ufydPt+i (u>t+i |wj() 

k—>oo J 1 

= limsup / ^t+i(£Bfc,fft+i(wJ.,et +1 ),a>^.)d<9 t+ i(e t+ i) 

k->oo JSt+1 

< / \ims\ip$ t+1 (x t k ,H t+1 {u; t k ,et + i),u) t k )dQ t+1 (£ t+1 ) 

J£t+i k—+oo 

< / < &t+i(x t ,Ht+i(u> t ,£t+i),w t )dQt+i(£t+i) 

J£t+ 1 



= {Et$t+ lKx*, <*>*)• 



The inequalities hold because of Fatou’s lemma (which applies since $t + 1 is 
bounded from above) and the upper semicontinuity of <P t + l in conjunction 
with the continuity of cu f >—> i7 f+1 (u/, e t+ i). Therefore, (E t $ t + 1) is use on 
Y*. Moreover, the definition of the expectation functional and the induction 
hypothesis imply that Jf^=t+i As — (Et& t+i) < EL + i k s on Y*. 

By assumption, the effective profit function p t is use and bounded on Y f . 
For {x t ~ 1 ,u> t ) £ Z l the recourse function of stage t is given by 

sup p t (x t ,u> t ) + 

x,en 

= sup p t (x t ,uj t ) + (E t $ t+ i)(a; t ,u; t ). 




2.4 Static and Dynamic Stochastic Programs 



19 



Thus, by [97, theorem 1.17] is use on Z l since X t (Z t ) is compact (notice 
that Xt(Z l ) is the projection of Y f on K”*, which is compact by proposi- 
tion 2.3; see also Fig. 2.3). Assumption (A5) and the definition of the recourse 
function <& t directly yield the inequality 

T T 

on Z l . 

S=t S=t 

By induction, <P t is use and bounded for every t € r, and (E t <P t + 1 ) is use and 
bounded for t € T-t- Finiteness of the optimal value (E<P o) follows from an 
elementary argument. □ 

Theorem 2.6. Under the assumptions (Al)-(A5) the static and dynamic ver- 
sions of a multistage stochastic program are both solvable, and the optimal 
values coincide. 



Proof. Let (l? 4 , B{Q t ), P l ) be the marginal probability space associated with 
the random variables appearing in the first t stages. Moreover, define an aux- 
iliary function 

M ' «*•,*>*) for «=T. 

We use induction by t to prove that 

sup f pt(x t (pj t ),w t )dP t (u) t ) (2-13) 

x*e.V„‘ Jw 



is solvable on the space of non-anticipative policies 5 up to time t, Af n t := 
x* =0 7V’ ns , and that the optimal value is given by (E<& 0 ). Beginning at stage 
t — 0, we may reformulate (2.11c) and calculate the unconditional expectation 
value to obtain 

(E$ o) = / sup po(xo,(Vo) dPo(uio)- 
Jn 0 * 0 eR"o 

By proposition 2.5 the auxiliary map po is use and finite on Y°. Furthermore, 
by the definition of the effective profit functions we have po(xo,<jJq) = — oo for 
(xo, mo) ^ Fo- Thus, po is use on M”° x i? 0 and its effective domain is given by 
dompo = Y°. These results can be used to show that po is a normal integrand 
in the sense of [91]. Concretely speaking, we must prove that the hypograph 
multifunction u>o h- > 7i Po (cvo) ■= hypopo(’,mo) is measurable and closed- 
valued. Obviously, H Po is closed-valued as a direct consequence of the upper 

5 Note that M nt is naturally isomorphic to P*; R™‘). Hence, these 

spaces will be identified in the remainder. 




20 



2 Basic Theory of Stochastic Optimization 



semicontinuity of the mapping Xq i— > p 0 (xo,u>o). To prove measurability, it is 
sufficient to verify that 

n~ 0 \C) :== {cuo G Qo 1 WpoK) nc^0} 

is Borel measurable for all compact sets C C R"° [91, proposition 1A]. Recall 
that the hypograph of po, which coincides with the graph of the multifunction 
Hpg, is closed since po is use. In addition, notice that TL~^(C) is given by 
the projection of (C x fi 0 ) n graph H Po on fi 0 . Then, compactness of C x fi 0 
implies that ( C x fio) ft graph 7i Po is compact, and W“ 0 1 (C') is compact and 
Borel measurable as a projection of a compact set. Therefore, po is a normal 
integrand on R"° x fio. 

For all xq G J\f no we have 

Po(*o(w 0 ),w 0 ) < sup p Q (x 0 ,u>o)- (2.14) 

The supremum on the right-hand side of (2.14) is attained for all ojq £ fi o since 
Po is use and its effective domain is compact. Consequently, the multifunction 

r(u> 0 ) argmax{p 0 (a:o,^o) | ®o G 1"°} 

is non-empty-closed- valued and measurable on the entire space fio- Measur- 
ability follows from [91, theorem 2K[. Moreover, by [91, corollary 1C] there 
exists a measurable selector * O pt ,0 : fio — > R”° such that 

*opt,o( w o) G r(u> o) Vcuo G fi o- 

Since -T(wo) C X 0 (Z°) and X 0 (Z°) is compact, we may conclude that * op t,o 
is essentially bounded. The estimate (2.14) together with the existence of an 
optimal policy £C op t,o G Af no implies 

{E $ 0 ) = _ sup / p 0 (x 0 {u 0 ),u’o) dP 0 (u> 0 ). (2.15) 

3'O^XriQ J r?o 

This proves that the optimal value of (2.13) is given by (E 5?o), and thus the 
basis step is established. 

Next, assume that the claim holds for t — 1 and that e A/^t-i is 

an optimal policy. Substituting the definitions of the expectation functional 
and the recourse function <3? t into the induction hypothesis, applying 
the generalized Fubini theorem , 6 and rearranging terms we obtain 

®For every ^‘-measurable function tp : J7‘ — > [— oo, oo] we have the identity (for 
a proof see e.g. [2, theorem 2.6.4]) 

f ip{u t )dP t (u t )= [ [ <p(u> t ) dPt(wt\<jO t - 1 )dP t ~ 1 (u) t ~ 1 ). 

Jn<- J n ‘- 1 J n t 
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sup 


1 pt- i(i* 1 (w* 1 ),w* l )dP t 7 (w* 1 




Jn*- 1 


sup 


[ sup p t (a: t ,x* _1 (u;* _1 ),u;*)dP*(u;*). 


& i - 1 €Jf n t - 1 « 


Jn * x t eR n * 



Recall that p t is use and finite on its effective domain Y l , which is compact. 
An argument parallel to the one in the basis step proves that pt is a normal 
integrand on K" x 12* in the sense of [91]. For the further argumentation we 
fix an arbitrary policy x * -1 E Af n t- 1 . Then, the mapping 

{x t ,uj l ) p t (x t ~ l {<j} t ~ l ),x u u} t ) 

is a normal integrand on M n * x 12* due to [91, corollary 2P]. For all x t E M nt 
we have 



Ptix 1 1 (w t 1 ),x t (u; t ),oj t ) < sup p t (x l 1 (uj t 1 ),x t ,u> t ). (2.16) 

di6«*i 



The supremum on the right-hand side of (2.16) is attained for all u> 1 E 12* since 
p t is use and its effective domain is compact. Consequently, the multifunction 

P(u>*) := argmax {p t (x t ~ 1 (u> t ~ 1 ),x t ,u> t ) \ x t E 3R"‘} 

is non-empty-closed- valued and measurable on the entire space 12*. Measur- 
ability follows from [91, theorem 2K[. Moreover, by [91, corollary 1C] there 
exists a measurable selector 7 x\ : 12* — > K n * such that 

x *(w*) E P(w*) Vtu* € 12*. 



Since P( cu*) C X t (Z t ) and X t (2'*) is compact, we may conclude that x * is 
essentially bounded. The estimate (2.16) together with the existence of an 
optimal policy x* E N nt implies 

(E$ 0 ) = sup [ sup p t (x t ~ 1 (u) t ~ 1 ),x t ,uj t )dP t (u t ) 
x t ~ 1 €M' n t-i Jn* x t m n t 

= sup sup f Pf(i*~ 1 (u;*“ 1 ),xt(cu*),u>*)dp*(a;*) 
x'~ l eN' n t - 1 * t eA ! nt Jn * 

= sup f p t (x*(a;*),a;*) dP*(u>*). (2-17) 

Jn* 



The last step proves that the optimal value of (2.13) is given by (E<P o)- In 
the third line of (2.17) the two ‘sup’-operators may be combined because of 
proposition A. 4 in the appendix. 



7 Notice that x* t depends on the choice of x* 1 G N n t-i . 




22 



2 Basic Theory of Stochastic Optimization 



In order to establish solvability, consider the policy x ^ £ A/"„t-i given by 
the induction hypothesis. Repeating the arguments of the previous paragraph, 
we can show that there is a measurable function x 0 pt,t € Af nt such that 

x op t, ti^*) € axgmax{p t (x t op l(iv t ~ 1 ),x t ,uj t ) | x t £ R"*} Vw* € Q 1 . 

Thus, it can easily be verified that x° pt := (*opt, *opt,t) is a maximizer of 
(2.13) on the space of non-anticipative policies up to time t, i.e. 

(E$ 0 ) = f Pt {x t op ,{u: t W)dP\u: t ). 

Jo* 

For t — T (2.13) reduces to the static version of the stochastic program (2.8) 
and its optimal value coincides with (E<& o). In particular, (2.8) is solvable, 
and thus the claim is proved. □ 

Proposition 2.5 and theorem 2.6 are inspired by [92] and the dynamic 
programming framework developed in Klein Haneveld [64, Sect. 6.3]. 

Many decision problems of practical relevance can be formulated as mul- 
tistage stochastic programs satisfying the regularity conditions (Al)-(A5). 
However, sometimes it proves useful to work with a set of slightly more re- 
strictive regularity conditions, which impose stronger structural properties on 
the recourse functions. 



(Al)’ = (Al); 

(A2)’ the profit function pt is continuous on Y t for all t £ r; 

(A3)’ the feasible set mapping X t is continuous and bounded on Z t for 
all t £ r; 

(A4)’ the random data {d> t }t ST follows a nonlinear autoregressive process; 

Ht is a continuous function, and £ t is compact for all t £ r_o; 
(A5)’ = (A5). 



These regularity conditions are obtained from the less restrictive conditions 
(Al)-(A5) if we replace upper semicontinuity by ordinary continuity in (A2) 
and (A3) and if we require the transformation function H t to be continuous 
in (A4). One can easily verify that the conditions (A1)-(A5) are implied by 
(Al)’-(A5)’. Notice that boundedness of the profit functions need not be 
postulated explicitly in (A2)’. Instead, it follows directly from the Weierstrass 
maximum principle. It is also clear that the statements of the propositions 
2.5 and 2.6 as well as necessity in proposition 2.3 still hold true under the 
regularity conditions (Al)’-(A5)\ In addition, one can now prove continuity 
of the recourse functions. 

Proposition 2.7. Under the assumptions (Al)’-(A5)' the recourse function 
<P t is continuous on Z l for all t £ r. Moreover, the expectation functional 
(E t <P t - |_i) is continuous on Y l for all t £ T-t- 
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Proof. The claim is proved by backward induction with respect to t. For any 
outcome and decision history ( x T ~ 1 ,ui T ) € Z T the stage T recourse function 
can be written as 



<&t{x T 1 , w t ) = sup [pt(x t , oj t ) j xt S Xt{x t 1 ,cu t )}. 



By assumption (A2)’ the objective function of this maximization problem is 
continuous on Y T . Moreover, assumption (A3)’ requires the feasible set map- 
ping Xt to be continuous on Z T . Thus, by Berge’s theorem [13, theorem 12.1] 
&T is continuous on Z T . Hence, the basis step is proved. 

Assume now that the recourse function <P t + i is continuous on Z t+1 . Then, 
choosing an arbitrary convergent sequence {(x t k ,u> t k )}ke^ in Y 4 with a limiting 
point (x t ,u: t ) £ Y 4 we find 



lim (£ t #t +1 )(a4,<4) 

fc— >OC 



= lim 

k— >oo 



/ $t+ H t+ i (<*4 , £t+i) , vDdQt+i (e t +i ) 

J £t+i 

/ lim $ t+1 (xi,H t+1 ((vi,et+i),u> t k )dQ t+1 (e t+1 ) 

Je t+1 

/ &t+ lix 1 , f?t+i(w t ,£ t+ i), u) l )dQt+i(e t +i) 

J &tA-l 



’£t+ 1 

(E t $ t+1 )( **,«*). 



The second and the third equalities hold because of the dominated convergence 
theorem (which applies since <?t+i is bounded and the integration region is 
compact) and continuity of the integrand. Therefore, (E t <P t +i) is continuous 
on its natural domain Y 4 . 

For (a; 4-1 , w 4 ) £ Z l the stage t recourse function is representable as 
^t(* t_1 ,w 4 ) = sup p t (x t ) <jj t ) + {E t $ t+ i){x t ,u} t ) 
s.t. x t € A t (x t_1 , w 4 ). 



By assumption (A2)’ and continuity of the expectation functional, the objec- 
tive function of this maximization problem is continuous on Y t . In addition, 
assumption (A3)’ requires the feasible set mapping Xt to be continuous on 
Z % . Thus, as in the basis step, Berge’s theorem [13, theorem 12.1] guarantees 
continuity of on Z l . By induction, is continuous for every t £ r, and 
(E t $t+ 1) i s continuous for every t £ T-t • □ 

Although many stochastic programs of practical relevance comply with 
the regularity conditions (A1)-(A5), sometimes it might be necessary to cir- 
cumvent or replace assumption (A4), which does not allow the distribution of 
the disturbances to depend on the outcome history. On these occasions, one 
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may invoke a different qualification of the conditional probabilities to preserve 
upper semicontinuity of the expectation functionals. However, a much simpler 
approach is available in the presence of discrete probability measures. Then, 
upper semicontinuity of the value functions is not necessary to prove equiv- 
alence of the static and dynamic versions of a given stochastic program. In 
contrast, the following set of regularity conditions will suffice. 



(Al)» - (Al); (A2)” = (A2); (A3)” = (A3); 

(A4)” the probability measure P t (-\oj t 1 ) is discrete with finite support 

suppP t (-|w t_1 ) = {o> Mt (o; t_1 ) | i t = 1 

and the probability of the it’th atom amounts to (u> 4_1 ) for 
all t € r_ o and u/ -1 € f? 4_1 ; in addition, the marginal measure 
To has finite support with atoms u>o,i 0 and associated probabilities 
Po,i 0 for i 0 = 

(A5)” = (A5). 



By definition, P t being a regular conditional probability requires the atoms 
u) t ,u and the associated probabilities p t ,i t to be measurable functions on I2 t_1 , 
which will always be assumed implicitly. Notice that proposition 2.3 still holds 
under the conditions (A1)”-(A5)”. Moreover, corollary 2.8 below is the coun- 
terpart of proposition 2.5. 

Corollary 2.8. Under the assumptions (A1)”-(A5)” the recourse function <£> t 
is bounded, measurable in u ; 4 , and use in x t_1 on its natural domain Z l for 
all t € r_o . Moreover, is bounded and measurable on 

Proof. As usual, the claim is shown by backward induction with respect to t. 
Consider first the recourse function A/-. Boundedness and upper semicontinu- 
ity in x T ~ 1 can be shown as in proposition 2.5, while measurability in oj t is 
straightforward. This observation completes the basis step. 

Next, assume that the recourse function <I>t+i is bounded, measurable in 
u> 4+1 , and use in x l on Z t+1 . Consequently, the expectation functional 

h 

(E t $ t+ iXa^u/) = 

U = 1 

is bounded, measurable in a; 4 , and use in x l on Y f by (A4)” and the induction 
hypothesis. Remember also that products and compositions of Borel measur- 
able functions are Borel measurable. Next, consider the recourse function <P t - 
Boundedness and upper semicontinuity in a; 4-1 can be shown by arguing as in 
proposition 2.5, and measurability in u > 4 is obvious. These notions complete 
the induction step, and thus the claim follows. □ 
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Corollary 2.9. Under the assumptions (A1)”-(A5)” the static and dynamic 
versions of a multistage stochastic program are both solvable, and the optimal 
values coincide. 

Proof. Equivalence is straightforward since here integrals generally reduce to 
finite sums over the atoms of some marginal or conditional probability mea- 
sure. Thus, interchanging the order of maximization and summation is un- 
problematic. Solvability is due to upper semicontinuity of the profit functions 
and the expectation functionals as well as compactness of the feasible sets. □ 



2.5 Here-and-Now Strategies 



By definition, a strategy or policy is a decision rule x specifying the actions 
to be executed in each time step and in each scenario. Remember that non- 
anticipativity requires the measurable functions x t to depend only on obser- 
vations up to time t. In a more abstract formulation, the multistage stochastic 
program (2.8) reads 

(E<l>o) = sup E(p(x(u>),u>)). (2.18) 

Thereby, the objective function is given by the expectation value of 

T 

p(x,u>) := 

t=o 

Optimal policies £c op t € arg max* gjv;,, E(p(x(u?),u?)) are called here-and-now 
strategies. On average, the implementation of a here-and-now strategy yields 
the optimal return (Ed ? 0 ) = E(p(x a pt (u>),u>)). 



2.6 Wait-and-See Strategies 

If a decision maker can perfectly forecast the future outcomes of he or she 
ends up with a deterministic scenario problem conditional on the anticipated 
realization cu: 



<Pq{oj) := sup p(x,u>) (2-19) 

sceR" 

Before the forecast, the expected earnings amount to (Ed? o) := E($o(u>)). 
Notice that (Ed> o) would coincide with the result of (2.18) if the non-anticipa- 
tivity restrictions were withdrawn. Thus, (Ed> o) is the result of the following 
optimization problem 
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(E3> o) — sup E{p(x{Cb), uj)) . (2.20) 

In fact, the mathematical program (2.20) has the same objective function 
as the here-and-now problem (2.18). However, the space of non-anticipative 
policies N n is a subset of the space of anticipative policies Lff , and thus 
(E<Pq) constitutes an upper bound for (E<Pq). In particular, the indicator 
EVPI := (E$ o) — (E<P o) is a positive number and denotes the so-called 
expected value of perfect information. The EVPI was first introduced by 
Raiffa and Schlaifer [88], and it determines the value of knowing the real- 
ization of the random variable u> in advance, in contrast to knowing only the 
distribution of Gj. Alternatively, the EVPI can be interpreted as an insur- 
ance price or the maximum amount a rational actor would pay for perfect 
information about the future. Furthermore, optimal (anticipative) policies 
x ws £ argmax£ 6 £«j E(p(x(u>),u))) are referred to as wait-and-see strategies. 
This terminology is due to Madansky [70]. As indicated above, a wait-and- 
see strategy achieves higher expected returns than a here-and-now strategy. 
However, its implementation requires perfect forecasts of the uncertain pa- 
rameters; such forecasts are often expensive or unavailable. 



2.7 Mean- Value Strategies 



If a decision maker is unable or unwilling to take account of the stochastic- 
ity inherent to an optimization problem, he or she has to treat the random 
variables as deterministic parameters (whose values should be chosen with 
diligence). For instance, given a sequence of outcomes uA known by observa- 
tion, a reasonable approximation of the future uncertainties is given by their 
conditional expectation values 8 E t {£j s ). Thus, at time t the decision maker 
under consideration solves the deterministic optimization problem 



sup ( X ' Ps{x s , E t (u s )) 

*£ 18 ” l s=t 

where the vector ^^(uA -1 ) contains those decisions, which have been im- 
plemented previously in response to the observations uA -1 . Subsequently, the 
‘optimal’ stage t decision £C mv ,t(uA) - as evaluated in (2.21) - is implemented, 
and the outcomes tat+i are observed. Of course, the realization cu t+ i of the 
random vector Cb t+ 1 is subject to the underlying conditional probability dis- 
tribution and does not necessarily coincide with the conditional expectation 
Et(u>t+ 1 ) (as suggested in (2.21)). Therefore, the optimal stage t+ 1 decision 
of (2.21) is no longer adequate and could even be infeasible. In contrast, the 

8 Notice that the conditional expectation Et(u>s) is an ordinary function of the 
outcomes uA observed up to time t (Vs 6 r). 



as *" 1 = a&lW -1 : 



(2.21) 
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actions x mVyS (u> s ) have to be determined with a rolling optimization scheme 
for s = t+1 , . . . ,T. At every decision stage problem 2.21 must be recalibrated, 
i.e. the conditional expectations and the restrictions have to be updated con- 
sistently with the new observations and the implemented decisions. 

By construction, the policy x mv obtained with this rolling optimization 
scheme is non-anticipative and feasible, 9 i.e. x mv E J\f n . The expected profit 
attained with this relatively modest decision strategy amounts to 

{E<t q) :== E(p(x mv (uj),6j)). (2.22) 

Obviously, {E<& 0 ) is a lower bound for (E<& o) = E(p(x 0 pt (o>), u))), since cc op t 
maximizes the expected payoff over N n . Intuitively speaking, the error of dis- 
regarding the randomness of certain problem determinants lowers the achiev- 
able profit. Following Birge and Louveaux [10, Sect. 4.2], we can introduce the 
positive indicator VSS := (E<P a ) — (£V 0 ), which represents the value of the 
stochastic solution (alternative definitions are possible). The VSS measures 
the relative performance of the here-and-now strategy x ov t as compared to 
the optimal mean-value strategy x mv and characterizes the maximum amount 
a rational decision maker would invest for incorporating uncertainty into an 
optimization problem. 

The above reasoning remains valid for any deterministic process which is 
used to approximate the future uncertainties - taking the conditional expecta- 
tion value is just an intuitive though arbitrary choice [8]. The numerical value 
of the VSS is certainly sensitive to the choice of the deterministic process, but 
it will always be positive. 



9 Using similar arguments as in the proof of theorem 2.6, it can be shown that x I71v 
is a measurable mapping. Boundedness and non-anticipativity are straightforward. 
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Convex Stochastic Programs 



In this chapter we focus on convex maximization problems. Thus, we assume 
the profit functions to be concave and the constraint functions to be convex 
in the decision variables. Under certain regularity conditions, which will be 
specified in Sect. 3.5, the recourse function of a convex multistage stochastic 
program is subdifferentiable and exhibits a characteristic saddle structure. 
We aim at exploiting these properties in order to discretize the underlying 
probability space in an efficient way. 

Below, in order to simplify notation, random variables as well as their 
realizations will both be denoted by u> t . Similarly, we use the same symbol Xt 
for the entire decision rule and the realized actions, as long as there is no risk 
of confusion. 



3.1 Augmenting the Probability Space 



The uncertain parameters u> t may influence both the profit and the constraint 
functions of any prototype optimization problem (2.8), which will be assumed 
to comply at least with the elementary regularity conditions (A1)-(A5). Based 
on this notion, Frauendorfer [42, Sect. 5] suggests a classification of the com- 
ponents of u>t . Let us therefore introduce two subvectors r/ t and of the 
random vector u> t , which are defined as follows: 

(a) r) t comprises all components of w t which influence the profit functions in 
a nontrivial way; 

(b) consists of those components of uj t which influence the constraint func- 
tions in a nontrivial way. 
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Some components of u) t can be assigned exclusively either to rj t or £ t . In 
contrary, if this assignment is not unique for a specific component, we attach 
it to both subvectors and interpret it as a degenerate pair of uncertain pa- 
rameters. The construction of r] t and £ t can be formalized by means of two 
problem-specific coordinate projections tt° and n r t (ter). 

0 (R M -*R Kt r f R m -> R L ‘ 

nt ' 1 <*> ^ =Vt 7 t £( o >) = £ t 

By assumption, 0 t := 7 r°(i?) is a compact subset of R Kt and covers the 
support of rj t , whereas St := 7r l(fl) is a compact subset of M Lt and covers 
the support of £ t . In complete analogy to the conventions met in Sect. 2.2, we 
form separate outcome histories 

n t: = {Vo,---,Vt) e6>*:= 0 o x ••• x 0 t cR K \ 

€* := (€<)»■••.€<) 6- := S 0 x ■ ■ ■ x S t c R L *. 

As usual, the dimensions of the underlying Euclidean spaces are required to 
match up ( K 4 = Kq + ■ ■ ■ + K t , L L — Lq + ■ ■ ■ + L t ), and in the last decision 
stage the indices may be left out: rj rj T , 0 := 0 T c £ := £ T , and 
S := S T C I L (K := K T and L := L r ). Formally speaking, the elements 
rj 6 0 and £ £ S corresponding to a specific outcome uj 6 Q axe obtained by 
means of the coordinate projections 7r° and 7r r . 

7T° := 7Tq x • • • X Txf. 7T r := TTq X X TTr r 

These projections enter the definition of a measurable transformation y which 
relates f? to 0 x S (cf. Fig. 3.1). 

f f? — > 0 x S 

X ' l <*> (n°M, v r M) = (rj, £) 

The measurable space (0 x S, B(0 x S’)) basically inherits the probability 
measure P defined on (f7,S(f2)). In fact, the induced probability measure P a 
on (0 x S, 13(0 x S)) is characterized through 

P a (A) P(y~ 1 (A)) for A 6 B(0 x S). 

In the sequel, the triple (0 x S,B(0 x S),P a ) is denoted as the augmented 
probability space. It can easily be verified that P a is a regular probability 
measure, whose support is covered by 0 x S. Notice that P a (A) = 0 if A n 
x(f2) — 0. Next, define new profit and constraint functions on the augmented 
probability space. 



PtOW) : = MxW) 

/?(**,£*) := ft(x W) 



Vt G t, (r},£) = y(u>) 
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Fig. 3.1. (a) If there are no degenerate pairs of random variables, we find y(i?) = Q. 
Nevertheless, OxE does not necessarily coincide with Q. Figure (b) shows the case 
of total degeneracy with O = E = Q. Generally speaking, degeneracy implies that 
x(i?) has lower dimension than the product space 0 x E 



Observe that pf(x t , ■) is a Borel measurable function on O l instead of J? 4 , 
which is independent of £ 4 . Similarly, ff(x l , ■) represents a Borel measurable 
mapping on S' 4 being independent of rj 1 . In fact, Borel measurability of pf 
and ff follows from Borel measurability of p t and f t . respectively, as may be 
seen from part (1) of the proof of [2, theorem 2.6.4]. Then, let us introduce the 
static version of a stochastic program on the augmented probability space. 



sup 
x€iV» Je* 



i 



W) 



t=o 



dP a (v,0 



s.t. /*( ar,£)<0 P a - a.s. ter 



(3.1) 



Here, represents the space of essentially bounded non-anticipative pol- 
icy functions on (0 x S, B(0 x E), P a ). By means of the change of variable 
formula of integration theory it can be shown that (3.1) is equivalent to the 
underlying original problem (2.8), see e.g. [42, Sect. 5]. In fact, both prob- 
lems have the same optimal values, and their optimizers are related by the 
measurable transformation X- 

Intuitively speaking, there is no loss of information if we ignore those 
subspaces on which the profit and constraint functions are constant. Thus, p\ 
and ff will be identified with p t and f t in the remainder, and the superscript 
‘a’ is suppressed to simplify notation. Moreover, for the sake of transparency 
we will write P and A f n instead of P a and Af a , respectively. Sometimes it is 
useful to bring the explicit constraints to the objective function. Therefore, in 
complete analogy to (2.9), we introduce the effective profit functions 



pt (xW,^ 




Ptix^t] 1 ) for f t (x 4 ,£‘) < 0, 
— oo else. 
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Based on the above definitions in connection with the augmented probability 
space, we can also state the dynamic version of the stochastic program (3.1). 
First of all, the parametric family of stage T subproblems reads 

$ T (x T - 1 ,r) T ,£ t ) := sup p T (x T , r} T , £ T ), (3.2a) 

whereas the subproblems for t = 1, .... T — 1 are given by 

$ t ( := sup Ptix* ,rf ,£ l ) + (E t $ t +\){x t ,'n t ,Z t ). (3.2b) 

On the other hand, the zero stage problems can be written as 

$t(Vo>€ o) : = SU P Pt( x o,Vo,£o) + (E 0 &i)(x 0 ,ri 0 ,£ 0 ). (3.2c) 

jCoeK^o 

The optimal value functions in (3.2) are usually called recourse functions. 
Moreover, we made use of the expectation functionals (E t $ t+ i), which are 
defined in the obvious way for t £ T-t- 

(^ t+1 )(**, »!*,€*) := j <? t+1 (x t ,/ 7 t+1 ,^ +1 )dP t+1 (t 7t+1 ,C t+1 |77 t ,C t ) (3-3) 

The optimal value o) := f $odPo corresponding to the dynamic program 
(3.2) is given by the unconditional expectation of the recourse function <?o- 
The feasible set mappings 

£ 0 >-*-Xo(£o) and (x** 1 , £*)>-> X t ( for t £ r_ 0 

are defined as in Sect. 2.3 and can be interpreted as multifunctions which are 
independent of r /. Then, for any ter the nested feasible set mapping X t is 
defined through 

t j £t -> 2®"' 

X ' £R nt \xo€Xo(£o),...,x t £X t (x t -\£ t )}. 

Many statements about the dynamic program (3.2) are most conveniently 
expressed in terms of the generalized feasible sets Y f , which comprise the 
joint outcome and decision histories feasible up to time t, t £ t. Note that, 
unlike in Chap. 2, Y l is now built on the augmented probability space. For 
every t £ r set 

Y* := {(xW,?)W e G*, e € S*, X t £ **(€*)}. 

In fact, Y l is interpreted as the natural domain of the profit function p t , the 
expectation functional (E t ^ t+ i), and the constraint function f t ■ For the sake 
of transparent notation we need the related generalized feasible sets Z t , which 
contain no stage t decisions, t £ t. Set Z° := 0o x So, and for t £ r_o define 

:= {(x*- 1 , 77*, €*)|*|* G O t , i* £ S l , x ‘- 1 G X^ 1 ^- 1 )}. 

Notice that Z l basically constitutes the natural domain of the recourse func- 
tion of stage t. 
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3.2 Preliminary Definitions 

Once having introduced the basic notation and terminology, we should turn to 
the formulation of appropriate regularity conditions for problem (3.1), which 
are slightly stronger than (Al)-(A5). For instance, we are in need of suit- 
able constraint qualifications (see e.g. [6, Chap. 5] for a survey of common 
constraint qualifications and their use for validating optimality conditions). 
Thus, in Sect. 3.2.1 we will discuss Slater’s constraint qualification, always 
taking account of the fact that f t := ( f) n ■ / ® q , — /® q ) includes both inequal- 
ity and equality constraints. Furthermore, in Sect. 3.2.2 we will introduce a 
generalized notion of convexity allowing for extended-real-valued functions 
defined on nonconvex domains. Finally, Sect. 3.2.3 is devoted to the study of 
a class of autoregressive stochastic processes, which will play a major role in 
proving the main results of the present chapter. 



3.2.1 Slater’s Constraint Qualification 

In this section we introduce Slater’s constraint qualification and demonstrate 
its importance in the field of parametric optimization. For the sake of trans- 
parent notation, we consider a simple parametric optimization problem over 
the Euclidean space K", 

sup p{x,u>) (3.4) 

*SK n 

s.t. / m ( x,uj) < 0 

f eq (x,v) = 0 

and the parameter cu ranges over R M . The extended-real- valued objective 
function p : R n x R M — » [— 00 , 00 ] is assumed to be upper semicontinuous, 
whereas the vector- valued constraint functions 

f in : r x r m — > w in 

f eq : R n x R m -> M rBq 

are continuous. As usual, the corresponding feasible set mapping 

X : M m — > 2 E ” , X(u) := {x\f in (x,u) < 0, / eq (cc, w) = 0} 

characterizes a closed-valued multifunction. Furthermore, for some reference 
point u> £ K M , the equality constraints are of the form 

fp(x,u) = (wi,x) - hi(u>) Vi = 1, . . . ,r eq (3.5) 

on an open neighborhood V of CD, where the gradient W{ G R n is a constant 
vector 1 independent of tu, and hi : V —> K is a continuous functional for each 
i = 1, . . . , r eq . 

1 Allowing for constraint functions of type (3.5), whose gradients are not constant 
on the entire space, will avoid tedious case differentiations in Chap. 5. 
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Definition 3.1. The parametric maximization problem (3-4) satisfies Slater’s 
constraint qualification at the reference point Co if there is a decision vector 
y Gt" such that 



f m (y,co) < 0 and f eq {y, u>) = 0, 

and the gradients of the equality constraints (3.5) are linearly independent. 
The decision y is referred to as a Slater point. 

Linear independence of the gradients uq, i = 1, . . . , r eq , automatically holds 
true if redundant constraints are deleted. Therefore, this requirement is no 
real restriction. Slater’s constraint qualification is extremely useful when deal- 
ing with parametric optimization problems since it preserves feasibility. Con- 
cretely speaking, assume that Slater’s constraint qualification holds at the 
given reference point Co. Then, (3.4) is not only feasible at the reference point 
but also in its vicinity (see proposition 3.2 below). 

Proposition 3.2. If the parametric optimization problem (3-4) fulfills Slater’s 
constraint qualification at the reference point us £ R M , then the multifunction 
X has a continuous selector on a neighborhood V of Co. Formally speaking, 
there exists a continuous function 

x : V — > R" such that x(co) € X(co) Vw € V. 

Proof. Let y be a Slater point in X(Co), which is kept fixed. By continuity of 
the constraint functions there is a neighborhood U x V' of (y, Co) such that 

f m (x,u>) < 0 V(x,u>) S U x V' . 

If there are no equality constraints, we may simply set V := V' and x(oo) = y, 
and the claim follows trivially. In the presence of equality constraints, however, 
some additional work is necessary. By assumption, the equality constraints 
are of the form (3.5) on an open neighborhood V" of Co, and the vectors w%, 
i = 1, . . . , r eq , are linearly independent. Moreover, the real-valued functionals 
hi, i — 1, . . . ,r eq are continuous on V". If r eq < n, we may complement the 
given constraint set by additional fictitious constraints of the form 

fP(x, u>) = (wi , x) - hi(co) Vi = r eq + 1, . . . , n, 

where hi(co) := (wi,y) is constant. The additional equality constraints are 
fully determined by a family of new vectors w^, i = r eq + 1 ,...,n. These 
vectors are chosen such that {wi}f =1 is a basis of R". For the further ar- 
gumentation we need an n x n matrix W with row vectors uq, . . . ,w n . In 
addition, we combine the functionals hi to a continuous vector-valued map- 
ping h '■= (hi , . . . , h n ) on V" . By construction, W is regular and can be 
inverted. Thus, we may introduce a well-defined function 
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x(ix) := W 1 h(u>). 

Obviously, x is continuous on V", and we have x(Cj) = y. Thus, there is a 
neighborhood V of uj such that V C V' 0 V" and 

x(uj) £ U V(x,<v) £ V. 

Finally, one may check that the decision x{ui) satisfies the constraints 
f eq (x(u>), co) — 0 and f m (x(u:),uj) < 0 
for all parameters ueb This observation completes the proof. □ 

Corollary 3.3. Under the assumptions of proposition 3.2 the parametric op- 
timization problem (3.4) satisfies Slater’s constraint qualification at every lo 
in some neighborhood of Co. 

Proof. Consider the mapping x : V — > R n in the proof of proposition 3.2. For 
any to £ V one can easily check that x(co) represents a Slater point for the 
optimization problem (3.4) with respect to the reference point oo. □ 



3.2.2 Convex Functions 

The formulation of appropriate regularity conditions for problem (3.1) not 
only requires suitable constraint qualifications but also a precise definition 
of convex and concave functions. Classical literature on convex analysis usu- 
ally assumes the domain of convex functions to be convex [89]. However, in 
the remainder of this work we will sometimes need a generalized notion of 
convexity allowing for extended-real-valued functions defined on nonconvex 
sets. Such functions will naturally arise in Chap. 5. Moreover, for the sake 
of transparent terminology, we aim at extending the notion of convexity to 
vector- valued mappings, as is done e.g. in [67]. This will be particularly useful 
when dealing with constraint functions. The following definition provides a 
precise characterization of convexity as understood in this work. 

Definition 3.4. (Convex Functions) 

(i) A function p : K" — > [— oo, oo] is convex on an arbitrary set t/cl" if for 
every choice of xq £ U and X\ £ U one has 

p{x a) < (1 - A)p(cro) + \p{xi) 

for all A e (0, 1) such that x\ = (1 — X)xq + Aaq £ U . 

(ii) A mapping f : R" — > [— oo,oo] r is convex on an arbitrary set U C R n if 
each component fi is convex on U for i = 1, . . . , r. 
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Infinite values of convex functions are handled by using extended arithmetic as 
explained in [97, Sect. l.E]. Most importantly, one should use the convention 
oo — oo = —oo. An extended-real- valued function p is said to be concave if —p 
is convex; similarly, a mapping f is concave if — / is convex. When dealing 
with concave functions, one usually adopts the convention oo — co = oo, which 
gives rise to a slightly different form of extended arithmetic. 



3.2.3 Block- Diagonal Autoregressive Processes 



In the present chapter we are interested in stochastic programs whose recourse 
functions can be shown to exhibit a specific saddle structure. This forces 
us to restrict attention to a specific subclass of the nonlinear autoregressive 
processes considered in Chap. 2. Concretely speaking, we will have to study 
the class of so-called block-diagonal autoregressive processes. 



Definition 3.5. We say that the random data {r)u£t}te.T follows a block-dia- 
gonal autoregressive process if for every t £ t_o there are two possibly corre- 
lated random vectors with induced probability space (£f x £[,B(£° x 

£f),Qt) and two matrices H° £ R^ (XA - ( 1 and H[ £ R LtXL ‘ 1 such that 
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The initial data (r) o ,£ o ) an d the disturbances {£(, e [} t€T _ 0 are mutually in- 
dependent, and for each t € r_o we postulate: 



— £° C R Kt and ££ C are Borel sets; 

— Q t is a regular probability measure on (£f x £[, B{£° x £[)). 



Block-diagonal autoregressive processes exhibit a linear dependence on the 
outcome history. Moreover, the AR coefficient matrices of all stages are block- 
diagonal. The reason for this nonstandard requirement will become clear in 
Sect. 3.5. 




3.3 Regularity Conditions 



37 



3.3 Regularity Conditions 

Using the notation introduced so far, we require the stochastic program (3.1) 
on the augmented probability space to satisfy the conditions: 



(Bl) — &t is a compact simplex in R A '* covering the support of the ran- 
dom vector rj t , t G r; 

— S t is a compact simplex in R Lt covering the support of the ran- 
dom vector £ t , t € r; 

(B2) — the profit function pt is continuous on the underlying Euclidean 
space and concave in x l for fixed values of the stochastic param- 
eters, t € t; 

— there is a convex neighborhood of Y l on which pt is a saddle 
function being concave in x'\ convex in rf, and constant in £*, 
t € r; 

(B3) — the constraint function ft is continuous on the underlying Eu- 
clidean space and convex in x 1 for fixed values of the stochastic 
parameters, t € r; 

— there is a convex neighborhood of Y L on which f t is jointly convex 
in ( x t ,rf) and constant in rf, ter-, 

— the feasible set mapping X t is bounded on a neighborhood of Z t , 
t€r; 

(B4) the random data {r?t,£t}t€T follows a block-diagonal autoregressive 
process; £° and £* t are compact simplices for all t e r_o! 

(B5) the parametric maximization problems (3.2) fulfill Slater’s con- 
straint qualification at any reference point in Z t ,t£r. 



Assumption (Bl) implies that the marginal spaces <9 t and S t are simplicial 
sets with non-empty interior. This, of course, can always be enforced by appro- 
priate definitions, given that the support of the random variables is bounded. 
As argued in Sect. 2.4, the solution of a stochastic program depends only on 
the restriction of p t and f t to the natural domain Y f for every ter. Thus, 
one might expect that (B2) and (B3) should exclusively qualify the behav- 
ior of p t and f t on Y L . However, specific structural properties of the profit 
and constraint functions are required to hold on a neighborhood of Y f . This 
generalization is necessary for technical reasons and allows us to prove subd- 
ifferentiability of the recourse functions below. Assumption (B3) implies that 
the feasible set mapping Xt is convex-closed- valued. Moreover, (B3) requires 
both /® q and — /® q to be convex in their arguments implying that all equal- 
ity constraints are linear affine on a convex neighborhood of Y L . Assumption 
(B4) basically states that random data follows a block-diagonal autoregres- 
sive process driven by serially uncorrelated noise. The matrices H° and H r t 
contain the non- vanishing AR coefficients. Finally, condition (B5) states that 
the values of the feasible set mapping X t have non-empty relative interior on 
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the natural domain Z l due to the existence of Slater points and continuity 
of the constraint functions. This implies non-anticipativity of the constraint 
multifunction. 

In the remainder of this section we will derive some elementary impli- 
cations of the new regularity conditions. In particular, we will argue that 
(B1)-(B5) entail the more fundamental conditions (A1)-(A5). 

Proposition 3.6. Under the assumptions (Bl) and (B3) the generalized fea- 
sible set Y * is convex for t £ r. 

Proof. The assertion follows immediately from the convexity properties of the 
constraint functions. □ 

Proposition 3.7. Under the assumptions (B1)-(B5) the feasible set mapping 
X t is non-empty-valued on a neighborhood of Z l , ter. 

Proof. Apply proposition 3.2 to the parametric maximization problems (3.2a), 
(3.2b), and (3.2c) at an arbitrary reference point in Z t , t G t. Consequently, 
there exists a feasible decision vector for every outcome and decision history 
in a neighborhood of the reference point. As the reference point can be chosen 
freely in Z l , the claim follows. □ 

Proposition 3.8. Under the assumptions (B1)-(B5) the multifunction X t is 
upper semicontinuous on a neighborhood of Z f for all t G r. 

Proof. By boundedness of the feasible set mapping X t , there is a neighborhood 
V of Z 4 such that X t (V) represents a bounded subset of M n ‘. Moreover, con- 
tinuity of the constraint functions implies that the graph of the multifunction 
X t is closed. These observations allow us to invoke [13, proposition 11.9 (b)] 
to prove that X t is use on V . □ 

Proposition 3.9. Under the regularity conditions (B1)-(B5) the multifunc- 
tion Xt is compact-valued on a neighborhood of Z l , t £r. 

Proof. This is immediate from assumption (B3). □ 

Proposition 3.10. Consider a stochastic program subject to the regularity 
conditions (B1)-(B5). Then, for any ter and for every neighborhood U of 
Y l there is a neighborhood V of Z * such that the graph of X t over V is a 
subset ofU, i.e. 



graph C U. 
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Proof. Without loss of generality we may consider the case t > 0. Select 
a reference point (:r t_1 , ff, £*) € Z l . Since the feasible set mapping X t is 
compact- valued on Z *, there exist open sets V' and W with 

(®*“\ »!*,£*) e V', C W, and V" x W C U. 

Furthermore, since X t is use on Z 1 , there is an open neighborhood V" of the 
reference point such that 

Vfx*- 1 , »!*,£*) e V". 

Therefore, the graph of Xt over the open set V' fl V" is covered by U. As the 
choice of the reference point was arbitrary, there is a neighborhood V of Z l 
such that the graph of X t over V is contained in U. □ 

We may use the above results to show that the assumptions (B1)-(B5) im- 
ply the fundamental regularity conditions (A1)-(A5). Consider thus a stochas- 
tic program of the form (3.1) subject to (B1)-(B5). Then, the conditions (Al), 
(A4), and (A5) are trivially fulfilled. 2 Moreover, proposition 3.8 entails (A3), 
which is equivalent to compactness of the generalized feasible sets {Yt}te T , 
as implied by proposition 2.3. Assumption (B2) requires the profit function 
pt to be continuous (and a fortiori use) on a convex neighborhood of Y l for 
all t € r. Furthermore, the Weierstrass maximum theorem ensures that p t is 
bounded on Y t . Thus, (A2) follows. In summary, the stochastic program (3.1) 
satisfies each of the fundamental regularity conditions (A1)-(A5). Therefore, 
the statements of proposition 2.5 and theorem 2.6 hold, i.e. the static and dy- 
namic versions of the stochastic program (3.1) are well-defined and solvable, 
and the optimal values coincide. 

Proposition 3.8 proves upper semicontinuity of the feasible set mapping 
X t . This is not the sharpest result available. In addition, as X t has a convex 
graph, it is easily shown to be lower semicontinuous on a neighborhood of 
Z*. This follows from theorem 5.9 (b) in [97]. Notice, however, that lower 
semicontinuity is referred to as ‘inner semicontinuity’ in that reference. 

Being both use and lsc, the feasible set mapping X t is continuous on its 
natural domain Z l . This observation allows us to prove that the assump- 
tions (B1)-(B5) also imply (A1)’-(A5)’. Therefore, given the specific regu- 
larity conditions (B1)-(B5), the recourse functions are continuous on their 
natural domains by virtue of proposition 2.7. However, below we will argue 
that the recourse functions are not only continuous but also subdifferentiable 
and saddle-shaped. These properties will turn out to be more important than 
continuity. 



2 In order to check these regularity conditions set Q := <9 x S and w := (r ?,£). 
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3.4 sup-Projections 

In order to infer the saddle property of the recourse functions, we need two 
basic results about sup-projections, which are due to Rockafellar (see [89, 
theorem 5.5] and [90, theorem 1]). 

Proposition 3.11. Let p(x,r /) be an extended-real-valued function on R ra x 
R A which is convex in r). Then, the sup-projection <P(r}) := snp x p(x,r]) is a 
convex function on R A . 

Proof. Consider the family { p(x , -)}a;eR" of convex functions indexed by the 
parameter x. The epigraph of L> is characterized by the infinite intersection 
of the convex epigraphs of p(x ,-) , x £ R n , and constitutes a convex set. 
Consequently, <Z> is a convex function on R A . □ 

Proposition 3.12. Let p(x,£) be an extended-real-valued function on R" x 
R l which is jointly concave in x and £. Then, the sup-projection #(£) := 
sup x p(x, £) is a concave function on R L . 

Proof. Denote by E the projection of hypop on R L x R. The hypograph of <P 
coincides with E except for some special boundary points (the ‘vertical’ fibres 
of hypo$ must be closed for all fixed vectors £ £ R A ): 

hypo<£ = {(£,a) € R L x R] (£,/?) e .E V/? < a}. 

By assumption, the hypograph of the objective function p is convex, and since 
convexity is preserved under projections, E is convex, as well. This notion 
entails convexity of hypo ( P, and therefore <P is a concave function on R A . □ 

The conditions of proposition 3.12 can not be relaxed to allow for biconcave 
functions p. For example, consider the following extended-real- valued mapping 
defined on R 2 : 




x£ for (x,£) £ [—1, 1] x R, 
— oo else. 



Apparently, p(-, £) is concave in x for every fixed value of £, and the parametric 
function p{x , •) is concave in £. However, p is not jointly concave in x and £. 
The corresponding sup-projection 



<£(£) := sup p{x, 0 = |£| € R 

X 

is convex instead of being concave. Figure 3.2 depicts the nonconvex hypo- 
graph of p and demonstrates the construction of hypo d>. Thus, sup-projections 
of biconcave functions p are not necessarily concave. 
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hypo# 



Fig. 3.2. Example of a non-concave function p(x, £), which is, however, concave 
in each of its arguments. The corresponding sup-projection #(£) is convex, and its 
hypograph is given by the projection of hypo p along the x-axis 
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The following technical proposition is based on the results in Sect. 3.3 and 
will be needed to establish the saddle property of the recourse functions on a 
neighborhood of their natural domains. 



Proposition 3.13. Consider a stochastic program subject to the regularity 
conditions (B1)-(B5). Then, there are open convex sets 
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with the following properties: 



(3.6) 



(a) T n * x Y* is a neighborhood ofY l , t € r; 

(b) Z p xZl is a neighborhood of Z l , t e t; 

(c) p t is a continuous saddle function on Yf x being concave in x l , convex 
in rf, and constant in (f, t € r; 

(d) ft is a continuous convex function on Yf\ x Yf being jointly convex in 
(a;*, £*) and constant in rf, t € r; 

(e) y n 4 x Yf x Ot +1 X E t+1 c Zff 1 x Z* +1 for all t e t_ t ; 
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(f) the graph of X t over Zf x Zf is a subset ofY^xYfjter; 

(g) the multifunction X t is non- empty- compact-valued, on Zf x Zf, ter. 

Proof. The claim is proved by backward induction. By the regularity condi- 
tions (B2) and (B3) there are open convex sets Yff and which satisfy 
the requirements (a), (c), and (d). This observation completes the basis step. 
Assume now that the existence of suitable sets Yf and Yf has been shown for 
some t € t. Then, the propositions 3.7, 3.9, and 3.10 ensure the existence of 
open convex sets Zf and Zf which satisfy the requirements (b), (f), and (g). 
If t, = 0, we are finished. Otherwise, by compactness of Z* = x & t x E t 
(cf. the propositions 2.3 and 3.8) and by the regularity conditions (B2) and 
(B3) there are suitable sets T^ -1 and Yf~ x which satisfy the requirements (a), 
(c), (d), and (e). Therefore, the induction step is established, and the claim 
follows. □ 

Below we will assume that a collection of sets as in (3.6) is given, and that 
these sets exhibit the properties postulated in proposition 3.13. Now we are 
ready to state the main result of this section. 

Theorem 3.14 (Saddle Property I). Under the conditions (B1)-(B5) the 
recourse function <& t has a saddle structure on a convex neighborhood of Z l , 
t £ t. Concretely speaking, is concave in (a; 4-1 ,^*) and convex in rf, 
t € r_ o, while <£ 0 is concave in £o and convex in r/o- Moreover, the expectation 
functional (E t $t+i) has a saddle structure on a convex neighborhood ofY t , 
being concave in ( x *,£*) and convex in rf, t G r_r. 

Proof. We will argue that <£ t has a saddle structure on x Zjj for all t € r, 
whereas (E t <P t+ i) has a saddle structure on Yj x Yf for all t 6 r_y. The proof 
is by backward induction on t. First, introduce an auxiliary objective function 

[ p T on Y? x Yj, 
p T := < Too on Y% x (Y'J’) 0 , 

( — oo everywhere else, 

and a sup-projection 

$ T (x T ~ 1 ,r] T ,^ T ) := sup p T {x T ,r] T ,£ t ). 

By assumption (B2) the extended-real-valued auxiliary function p t exhibits a 
saddle structure on R n x R 7 '- x R L , being concave in (x T ,£ > T ) and convex 
in r/ r . Then, by proposition 3.11 the value function Pt is convex in rj T for all 
fixed values of (x T ~ 1 ,^ T ). Moreover, proposition 3.12 implies concavity of <&t 
in (x T ~ 1 ,t; T ) for all fixed values of rj T . By the construction of the auxiliary 
objective function and by statement (f) of proposition 3.13 we have <Pt = & t 
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on x Zy. This implies that shows the postulated saddle structure on 
the open product set Z ^ x . Consequently, the basis step is established. 

Assume now that the recourse function of stage f+1 exhibits the postulated 
saddle property, i.e. assume that 4 > t +i is a saddle function on the product set 
Z ^ +1 x Zfj' 1 . Then, by assertion (e) of proposition 3.13 we may conclude 
that &t+i is a saddle function on Y^ x x 6> t+1 x E t + 1 - Next, consider the 
expectation functional 

(E t $ t+ i)( xW,^) 

= I &t+i(x t ,r) t ,r)t+v£\£t+i) dp t+i('nt+i,£t+i\ r l t ,Z t ) 

J ©t+ixSt-i-i 

= [ H t+ 1 )(x\r] t ,e 0 t+l ,t\e\ + 1 )dQ t+l (e 0 t+ 1 ,e\ +l ). 

' ^t \ 1 x 4 - i 

The transformation H t+1 is linear, and its matrix is given by (cf. (B4)) 



H t+ i := 



1 K l 

H t + 1 !«,+ 1 

1 L* 

Ht+i i-it+i 



Thereby, l„t denotes the identity matrix on R n * etc. Notice that Iit+i pre- 
serves volumes and does not mix the arguments in which $ t+1 is concave 
with those in which it is convex. This implies that $t+i 0 #t+i is a saddle 
function on Yp x Y| j x £° +l x El +1 . Finally, the expectation functional inher- 
its the saddle structure of the integrand, since the operation f dQ t + 1 can be 
viewed as taking a nonnegative linear combination of saddle functions over 
Yr x Y'y. An equivalent argument for purely convex functions can be found in 
Wets [108, proposition 2.1]. 

The assertions (c) and (d) of proposition 3.13 guarantee that the effective 
profit function p t is jointly concave in (x t , £ ( ) and convex in rf on Y& x Y*. 
Next, introduce an auxiliary objective function (see Fig. 3.3) 

(Pt + {Et4?t+ 1 ) on Fp x Y"y, 

Pt := < +oo on Y[* x (Y[5) c , 

( —oo everywhere else, 

and a sup-projection 

7*,^):= sup Pt(x l , rf , £*). 

According to our previous results both (Et$t+ 1 ) and Pt are saddle functions 
on the convex product set Y^ x Y)j. Thus, the extended-real-valued auxiliary 
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Fig. 3.3. Yr, x is a convex neighborhood of Y 4 ( grey shaded area), where both the 
effective profit function p t and the expectation functional (Et$t+ 1 ) have a charac- 
teristic saddle structure. Thus, the auxiliary objective function pt exhibits a saddle 
structure on the entire underlying space 



function p t exhibits a saddle structure on R n * x x R L< , being concave in 
(a: 4 ,£ 4 ) and convex in rf. Now we are prepared to apply the specific results 
about sup-projections. By proposition 3.11, the value function f>t is convex 
in r ? 4 for all fixed values of (a: 4-1 ,^ 4 ), Moreover, proposition 3.12 implies 
concavity of § t in (a; 4-1 ,^ 4 ) for all fixed values of rf . By the construction of 
the auxiliary objective function and by statement (f) of proposition 3.13 it is 
clear that = <P t on x Zfj. This implies that dp t shows the postulated 
saddle structure on x Zfj. Hence, the induction step is complete, and the 
claim follows. □ 

Theorem 3.14 is a slight generalization of the results in [43, Sect. 2] since 
convexity of the profit functions in 77 as well as concavity of the constraint 
functions in £ are only required to hold on a neighborhood of Y 4 instead of 
the entire underlying space. Moreover, in [43] the level set lev<o ft is required 
to be compact. Here, however, we basically postulate compactness of the gen- 
eralized feasible set Y 4 , which is generically a real subset of lev<o ft (see also 
Fig. 2.3). Notice that condition (B3) allows for a special class of separable 
constraint functions /. = f\ + ff. Thereby, f) is convex in x 1 and constant 
in £ 4 , whereas f f is convex in £ 4 and constant in x l . 

In Chap. 5 we will further investigate the dynamic versions of specific 
stochastic programs. Concretely speaking, we will study the Lagrangian duals 
of some optimization problems of the form 3.2. As in classical literature on 
convex analysis [89,90], the objective functions of these maximization prob- 
lems must be concave in the decision variables. This requirement is guaranteed 
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by the following corollary 3.15, which states that the recourse functions and 
the expectation functionals are concave in the decision variables not only on 
their natural domains but on the entire underlying spaces. 

Corollary 3.15. Under the conditions (B1)-(B5), the recourse function <T t 
is concave in x l ~ l on the entire space, t £ t_ o- Moreover, the expectation 
functional (E t $t+ 1 ) concave in x 1 on the entire space, t £ t-t- 

Proof. By the assumptions (B2) and (B3), the profit functions are concave 
whereas the constraint functions are convex in the decision variables on the 
underlying Euclidean spaces. Using this observation, the claim follows from a 
similar argument as in theorem 3.14. □ 

It should be remarked that condition (B2) can be relaxed to allow for 
^-dependent profit functions: 



(Bl)’ = (Bl); 

(B2)’ — the profit function p t is continuous on the underlying Euclidean 
space and concave in x l for fixed values of the stochastic param- 
eters, t€r; 

— there is a convex neighborhood of Y t on which p t is a saddle 
function being jointly concave in (x 1 ,^) and convex in rf, t € r; 

(B3)’ = (B3); (B4)’ - (B4); (B5)’ = (B5). 



Corollary 3.16 (Saddle Property II). Assume the stochastic program 
(3.2) to satisfy the regularity conditions (B1)’-(B5)’. Then, the conclusions 
of theorem 3.14 are still true. 



Proof. With obvious modifications, the proof of theorem 3.14 still applies. □ 
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The generalized regularity conditions (B1)’-(B5)’ are of minor importance 
for practical applications. However, it is possible to formulate a third set 
of regularity conditions, which allows for a different class of and constraint 
functions. 



(Bl)” = (Bl); 

(B2)” — the profit function p t is continuous on the underlying Euclidean 
space and concave in x l for fixed values of the stochastic param- 
eters, ter; 

— there is a convex neighborhood of Y f where p t is a saddle func- 
tion being jointly concave in (x t ,£t), concave in cc t_1 , convex in 
rf, and constant in £ 4_1 , t £ r; 

(B3)” — the constraint function f t is continuous on the underlying Eu- 
clidean space and convex in x 1 for fixed values of the stochastic 
parameters, t£r; 

— there is a convex neighborhood of Y L where ft is jointly convex 
in (x t ,£t), convex in a; 4-1 and constant in (»7 4 ,^ 4-1 ), t £ r; 

— the feasible set mapping X t is bounded on a neighborhood of 
Z t ,t£r-, 

(B4)” the random data {r) t , £t}ter follows a block-diagonal autoregressive 
process; £° and £ t r are compact simplices, while the matrix of AR 
coefficients HI equals 0 for all t £ r_o; 

(B5)” - (B5). 



Corollary 3.17 (Saddle Property III). Assume the stochastic optimiza- 
tion problem (3.2) to fulfil the conditions (B1)”-(B5)” . Then, the recourse 
function 0 t is convex in rf , biconcave (separately concave) in a; 4-1 and £ t , 
and independent of £ l ~ l on a convex neighborhood of Z l , t 6 r_o- Moreover, 
0 O is convex in tjo and concave in £o on a convex neighborhood of Z°. The 
expectation functional (E t T> t+ \) is concave in x* , convex in rf , and constant 
in £ 4 on a convex neighborhood ofY 4 , t £ t-t- 

Proof. The proof of corollary 3.17 is widely parallel to that of theorem 3.14. 
Here, however, we exploit the fact that the expectation functionals are inde- 
pendent of the random parameters £ due to the assumptions (B2)”, (B3)”, 
and (B4)”. 3 □ 

Notice that condition (B3)” allows for a special class of separable con- 
straint functions f t = f \ + f\. Thereby, fl is constant in (a; 4-1 ,£ 4 ) and 
convex in x t , whereas f% is constant in x t and biconvex (separately convex) 



3 Corollary 3.17 is closely related to theorem 1 in Birge and Louveaux [10, 

Sect. 11.1], which basically exploits the convexity properties of a linear stochastic 
program to construct a dual feasible solution. 
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in x f_1 and £ 1 . Thus, corollary 3.17 allows for stochastic technology matrices 
in linear multistage stochastic programs. 4 On the other hand, the uncertain 
parameters £ t necessarily describe a sequence of independent random vari- 
ables. They do not have the flexibility to fit empirical data with serial corre- 
lations. Because of this major drawback we will concentrate on the regularity 
conditions (B1)-(B5) in the remainder. 



3.6 Subdifferentiability 

In this section we will argue that the regularity conditions (B1)-(B5) imply 
subdifferentiability of the recourse functions and the expectation function- 
als on their natural domains. Subdifferentiability and continuity are comple- 
mentary properties of convex and concave functions in the sense that none 
is implied by the other. For example, consider the concave function defined 
through 



f. JIM-H 

f a: » — >■ \/x . 

This is continuous relative to its domain, but it is not subdifferentiable 
at the origin, d$(Q) = 0. On the other hand, an example of an lsc convex 
function which is not continuous relative to the domain of its subdifferential 
mapping can be found in [89, p. 83]. However, Rockafellar shows that continu- 
ity, subdifferentiability, and pointwise finiteness are equivalent properties of a 
saddle function defined on an open subset of a finite-dimensional Euclidean 
space (cf. [89, theorem 10.1 and theorem 23.4]). 

Theorem 3.18 (SubdifFerentiability). Consider a stochastic program sub- 
ject to the conditions (B1)~(B5). Then, the recourse function <H>t is subdiffer- 
entiable on a neighborhood of Z l , t G r. Moreover, the expectation functional 
{Et&t+i) is subdifferentiable on a neighborhood ofY 1 , t G T-t- 

Proof. We will argue that T> t is subdifferentiable and continuous on Zf x Zf 
for all indices t G r, whereas {E t <P t + \ ) is subdifferentiable and continuous on 
Yf xYf for t G T-t- As usual, the claim is proved by backward induction. First, 
we show that T>t is pointwise finite on Z p x Zj. To this end, select a reference 
point (x T ~ 1 , t] T , £ T ) G Z ^ x Zfi. By assertion (g) of proposition 3.13 the 
corresponding stage T feasible set Xt{x t ~ 1 ,£ t ) is compact and non-empty, 
while the associated objective function 

XT p T (x T ,r] T ,£ T ) 

4 Cf. Sect. 5.4. 
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is continuous on This follows immediately from assertion (f) of 

proposition 3.13. Then, the Weierstrass maximum theorem implies finiteness 
of & T (x T ~ 1 ,rj T ,(; T ). As the choice of the reference point was arbitrary, <P T 
is pointwise finite on x Zfi. In the proof of theorem 3.14 we showed that 
<Pt is a saddle function on the open set Z^ x Z£. Thus, pointwise finiteness 
ensures via [89, theorem 23.4 and theorem 10.1] that <f> T is subdifferentiable 
and continuous on Z^ x Zfi. Thus, the basis step is proved. 




Fig. 3.4. Visualization of the generalized feasible sets Y* and Z\ By definition, 
lev< 0 f t is a subset of M n ‘ x R K ' x R L ‘ and a superset of Y*. The projection of Y 1 
on R n x R K x R 1, coincides with Z*. Each reference point (a? t_1 , rf , in Z * 

defines a fibre in Y f ( bold line), whose projection on R nt corresponds to the feasible 
set X t ( a; t_1 ,£ ( ) 



Let us now assume that we have already shown subdifferentiability and 
continuity of <Pt+i on Zpp 1 x Zpp 1 . Then, it can easily be verified that the 
recourse function <£ t+1 is also continuous onYpxY^x & t +i x St+i, as im- 
plied by assertion (e) of proposition 3.13. Moreover, the expectation functional 
{E t <£>t+ 1 ) is continuous and, a fortiori, pointwise finite on Yp x Y^ by the dom- 
inated convergence theorem, continuity of the integrand, and compactness of 
the integration region. Likewise, assumption (B2) requires continuity of the 
profit function p t on the entire underlying space and, a fortiori, on YpxY^. 

Next, select a reference point (a* -1 , rf, £ 4 ) € Zp x Zp. By assertion (g) of 
proposition 3.13 the corresponding stage t feasible set X t (x t ~ 1 ,^ t ) is compact 
and non-empty, while the associated objective function 

x t ^ p t (x\ rf, i l ) + £*) 

is continuous on X t (x t ~ 1 ,$ t ) (see also Fig. 3.4). This follows immediately 
from the continuity results of the previous paragraph and assertion (f) of 
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proposition 3.13. Then, the Weierstrass maximum theorem implies finiteness 
of rf, £*). As the choice of the reference point was arbitrary, is 

pointwise finite on x Z^. 

In the proof of theorem 3.14 we showed that is a saddle function on 
the open set Z^ x Z^, while the expectation functional (E t @ t+ 1 ) is a saddle 
function on the open set Y r \ x Y^. Thus, pointwise finiteness ensures via [89, 
theorem 23.4 and theorem 10.1] that <& t is subdifferentiable and continuous on 
Z p x Z{j whereas {E t $ t + 1 ) is subdifferentiable and continuous on x Fy. □ 

Theorem 3.18 is closely related to the stability criterion in [42, theo- 
rem 10.2], Moreover, it provides a slight generalization of the results in [43, 
Sect. 2], as our definition of Slater points allows for stochastic programs with 
equality constraints. 
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Barycentric Approximation Scheme 



4.1 Scenario Generation 

The solution of stochastic programs poses severe difficulties, especially in the 
multistage case. If the underlying probability space is continuous, the static 
version of some stochastic program represents an optimization problem over 
an infinite-dimensional function space (see e.g. (2.8)). Then, analytical solu- 
tions are available only for unrealistically simple models. Analytical treatment 
of the dynamic version of a stochastic program is no less challenging. The rea- 
son for that is twofold. First, instead of one ‘large’ mathematical program one 
faces for each decision stage a parametric family of ‘smaller’ mathematical 
programs. Moreover, evaluation of the expectation functionals requires multi- 
variate integration of a function which is only known implicitly as the result 
of a subordinate parametric optimization problem. 

Numerical solutions are usually based on a suitable discretization of the 
continuous probability space. The standard approach is to select a discrete 
probability measure with finite support and solve the stochastic program with 
respect to this discrete auxiliary measure instead of the continuous original 
measure. In doing so, one effectively approximates the original stochastic pro- 
gram by an optimization problem over a finite-dimensional Euclidean space, 
which is numerically tractable. 

The selection of an appropriate discrete probability measure is referred 
to as scenario generation and represents a primary challenge in the field of 
stochastic programming. For notational convenience, here, we work with the 
space (1?, B(f2),P ) instead of the augmented probability space 1 introduced in 
Sect. 3.1. Scenario generation is based upon the following procedure, which 
is motivated by the exposition in [44, Sect. 3]. For each t e r_ 0 let P t be 
a regular conditional probability for P on S(l?t) x and let Pq be the 

1 Without loss of generality, we may assume that Q = 0 x E. 
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marginal probability measure of P on 13(17 o). Assume that for each outcome 
history w 4_1 € 17 4_1 a discrete measure P d (-|a> 4_1 ) approximates P t (-|u;* _1 ) 
on 13(l7 t ), and a discrete measure Pg approximates Po on B{Qq). Furthermore, 
for every t £ t_o the assignment 

d . / B(Qt) x 17 4 ” 1 - [0, 1] 

4 [(A, a; 4 Pf(A\u> t 4 ) 

characterizes a transition probability , i.e. it complies with the following con- 
ditions: 

(i) P d (-|u> 4_1 ) is a regular probability measure on B(fit) for any fixed outcome 
history w 4-1 £ l7 4_1 ; 

(ii) P d (A|-) is a Borel measurable function on f? 4 1 for every fixed Borel subset 
A of I7 t . 



Then, by the product measure theorem [2, Sect. 2.6], the marginal measure 
Pg and the transition probabilities P d of stages t £ r_o can be combined to a 
unique probability measure P d on the measurable space (17, 13(17)). Concretely 
speaking, the product measure 

P d := P Q d * P d * • • . * P$ 



is defined through 

P d (A):= [ [ ... [ l A (u,)dP$(u, T \u, T ~ 1 ) ■■■dP d (u 1 \u 0 )dP d (u 0 ), 

J £2o J Q\ J Qt 

where A is an arbitrary Borel subset of 17, and 1 a is the corresponding char- 
acteristic function. The discrete measure P d represents an approximation of 
the original measure P, and the transition probabilities Pf can be viewed as 
regular conditional probabilities corresponding to P d . If the supports 

A(w 4-1 ) := suppP t d (-|o; 4_1 ) and Mo:=suppPg 

are finite sets, then the support of P d defines a finite scenario tree A. 

A := supp P d — {w e 17 1 uj t £ Mt(iu 4_1 ) Vt £ t_ 0 , wo £ Ao} 

Any outcome history u> £ A is called a scenario, and the associated path 
probability amounts to 



P d (M) := Pg ({w 0 }) 

t= l 

Let A 1 be the projection of A on 17 4 . Then, the outcome histories cu 4 £ A 4 
are referred to as stage t nodes of the underlying scenario tree. Moreover, the 
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cardinality of the finite set At( cu* *) characterizes the branching factor of the 
scenario tree at node u> t ~ 1 € A t ~ 1 . 

It is intuitively clear that an auxiliary stochastic program on a discrete 
probability space can be a reasonable approximation for a stochastic program 
on a continuous probability space if the scenarios and the associated path 
probabilities are chosen suitably and if we consider enough scenarios. How- 
ever, there is always a tradeoff between accuracy, requiring as many scenarios 
as possible, and numerical efficiency, which requires the branching of the sce- 
nario tree to be sparse. Keeping the branching factor fixed at each node of 
the scenario tree, the dimension of the auxiliary stochastic program grows 
exponentially with the number of decision stages. This effect is referred to 
as curse of dimensionality in literature [7]. Consequently, limited capacity of 
modern computers forces us to concentrate on few scenarios, which must be 
chosen with diligence. In fact, the discrete approximate probability measure 
P d should at least preserve some basic properties of the original continuous 
probability measure P. Moreover, it is indispensable that the solution of the 
approximate problem can be related in some way to the solution of the origi- 
nal problem, i.e. the exact solution of the auxiliary problem should provide an 
approximate solution of the original stochastic program. Ideally, one can find 
a discrete probability measure such that the optimal value and the optimizer 
set of the auxiliary stochastic program are, in a quantitative sense, close to 
the optimal value and the optimizer set of the original optimization problem, 
respectively. 

One sophisticated scenario generation technique that handles these con- 
flicting requirements is the barycentric approximation scheme by Prauendor- 
fer [42-44,46]. This specific method exploits the structural properties of con- 
vex multistage stochastic programs on an augmented probability space, which 
fulfill the regularity conditions (B1)-(B5). Furthermore, the barycentric ap- 
proximation scheme yields two discrete probability measures P l and P u with 
few scenarios. The optimal values of the associated auxiliary stochastic pro- 
grams provide upper and lower bounds on the optimal value of the original 
stochastic program. Therefore, P l and P u are referred to as bounding mea- 
sures. 

The key element in any scenario tree construction is the discretization of 
the conditional probability measure P t (-\uj t ~ 1 ). As far as the barycentric ap- 
proximation scheme is concerned, the discrete approximates P}(-\uj t ~ 1 ) and 
Pf{- |uF _1 ) represent extremal probability measures associated with specific 
moment problems. Since the recourse functions of regular stochastic programs 
are subdifferentiable saddle functions, one can show that these extremal mea- 
sures are completely determined by the distributional parameters of the orig- 
inal probability measure P. The underlying moment problems are formulated 
as semi-infinit linear programs on specially shaped polytopes with few ver- 
tices. Thus, the corresponding extremal measures have only few discretization 
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points and can be used to synthesize two discrete probability measures P l and 
P u defined on sparse scenario trees. Repeated application of a simple duality 
argument shows that P l and P u are indeed bounding measures. 



4.2 Approximation of Expectation Functionals 

For later reference we recall here some results by Frauendorfer [42, part III]. 
Concretely speaking, we address the problem of approximating the expec- 
tation of a saddle function from below and from above. Our argumentation 
applies to an important class of convex-concave saddle functions defined on 
a compact probability space. Although having in mind the recourse functions 
of a stochastic program, we initially suppress all time indices and any depen- 
dency on the outcome and decision history in order to increase readability. 

Formally speaking, let ^ be a real- valued saddle function on a probability 
space ( 0 x E, 13(0 x E), P). As usual, E(-) denotes expectation with respect to 
P. Assume that is convex in rj £ <9, concave in £ e E, and subdifferentiable 
on its entire domain. Moreover, assume that 0 C and E C R L are closed 
regular simplices such that supp(P) is a subset of the regular cross-simplex 
0xE; the notion of a cross-simplices has been introduced in [42, p. 8]. Denote 
by {a„}^L Q the vertices of 0, and let {6 M }^ =0 be the vertices of E. For the 
time being, the simplicial covering of the support of P seems to be arbitrary. 
But later it will become clear that this specific choice reduces the branching of 
the scenario tree to be built. A rectangular covering, on the other hand, leads 
to an excessive number of scenarios and results in a higher computational 
effort. 2 

Below it proves useful to work with barycentric coordinates. Then, the 
outcomes rj and £ are represented as convex combinations of the vertices 



K 

V ^ ^ ^1/ O' 1/1 


* 

II 

t— * 


(4.1) 


17=0 


17 = 0 




L 


L 




£ = 




(4.2) 


}i=0 


/i=0 





and the involved coefficients are called barycentric weights. These weights are 
nonnegative if rj and £ lie within 0 and E, respectively. Furthermore, A,, and 
t m depend on the corresponding outcomes and can be combined to vectorial 
functions. 

Mv) '■= (Ao(r?),...,A^(r 7 )) 
r(£) := (r 0 (£),..., t l (£)) 

2 This was e.g. pointed out by Birge and Wets [11, p. 77]. 
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Inverting (4.1) and (4.2) yields explicit expressions for the barycentric weights 



Afa) = S - 1 




t (0 = T - 1 




(4.3) 



where 



S := 



1 1 ••• 1 
ao ai ■ ■ ■ clk 



T := 



1 1 ••• 1 
ho b\ ■ ■ ■ bi 



Notice that S £ ]g>(-R'+i)x(^'+i) and T £ K(- L + 1 ) x ( i + 1 ) are regular matrices 
since the underlying simplices are assumed to be non-degenerate. Obviously, 
the barycentric weights \ v are linear affine functions of 77 , and A,,, (a„ 2 ) = 5 VlV2 
for all V\, V 2 £ {0, .... A'}. In other words, A„ is the unique linear affine 
functional on R K which adopts the value 1 at 77 = a,, and vanishes at all the 
other vertices of 0. Similar statements hold for the functionals r^. Next, we 
define the matrix of the generalized barycentric weights as the tensor product 
of A and r. 



7 ( 77 , 0 := A(T 7 ) ® r(£) £ K ( ^ +1) x (L+1) 

The matrix elements of 7 are given by 

7^0b£) = M»7) v(£)> 

and they are bilinear functionals of 77 and £ due to (4.3). 

With these definitions we are prepared to construct bounds on the expec- 
tation of a given saddle function ( I>. For didactic reasons, our argumentation 
is divided into three steps. First, we consider a purely convex function and 
exploit Jensen’s inequality [57] to find a lower bound. In a second phase, we 
restrict ourselves to the study of a concave function and determine an upper 
bound by means of the Edmundson-Madansky inequality [35,69,70]. Finally, 
combining the inequalities by Jensen and Edmundson-Madansky naturally 
leads to bounds on the expectation of an arbitrary saddle function. 



4.2.1 Jensen’s Inequality 

This section is devoted to the study of degenerate saddle functions #( 77 , £) 
which are constant in the second argument. Thus, without loss of generality, 
the parameter £ can be disregarded, and we are left with the problem of 
approximating a convex mapping ^( 77 ) from below by linear affine functions. 
The corresponding (convex) feasible set 

£:={£/ 1 L(r]) = (a, 77 ) — (3 < ${rj) V 77 £ 0} 
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contains all linear affine minorants of Moreover, the marginal measure 
P° on (0,B(0)) is defined through P°(A) := P(A x E), A £ B{0). For 
simplicity, P and P° are identified below, and the superscript ‘o’ is omitted. 
The best minorant L with respect to this probability measure is determined 
by maximizing the expected value of L over the feasible set £. 

sup f L(rj) dP(rj) — sup f - f3 dP(rj) (4.4) 

LG.C, J 0 oc J 0 

s.t. (a, r)) — (3 < &(r]) Vrj e 0 

Obviously, the integral in the objective of (4.4) reduces to L(E(r])) and is 
bounded above by <£( E(rj )). As $ was assumed to be subdifferentiable on 0, 
there exists at least one optimal supporting function L opt £ C which satisfies 
the strict equality 

L opt (E( V )) = 0{E(n)). 

As indicated in Fig. 4.1(b), the solution L opt of the primal optimization prob- 
lem (4.4) is not unique if the subdifferential of ^ at E(r)) contains more than 
one element. However, any optimal functional A opt is a lower bound for ^ and 
reproduces the maximal objective function value ${E{rf)). By linear program- 
ming duality we find a corresponding dual program (cf. [64, proposition 4.8]; 
for a general introduction to duality theory we refer to [90] and the summary 
in appendix A). 

inf f <P(ri)dQ(ri) (4.5) 

Q Je 

s.t. Q is a probability measure on 0 with / rj dQ(rj) — E(t]) 

Je 

The moment problem (4.5) is in fact a minimization problem over all proba- 
bility measures Q conserving the expectation value (first moment) of the true 
measure P. For additional information about the use of generalized moment 
problems in stochastic programming we refer to Dupacova [26,28]. Since ( I> 
is convex and subdifferentiable on 0, the dual problem (4.5) is solvable, and 
one specific solution P l is given by the Dirac measure concentrated at E{rf). 

P l = 5 e( v ) (4.6) 

The explicit construction of L opt and P l entails stability of the primal-dual 
pair (4.4) and (4.5), i.e. both problems are solvable and the optimal values 
coincide. In Fig. 4.1, the degenerate probability density function of P l is visu- 
alized as a narrow peak located at E(r]). However, it should be noticed that 
the dual solution is not necessarily unique. To see this, we recall the known 
fact that the optimal solutions of (4.4) and (4.5) are completely determined 
by primal and dual feasibility as well as complementary slackness (cf. [64, the- 
orem 4.2]). In our example, the complementary slackness condition reads 
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P l {{il € O | L opt (r]) < P(rj)}) = 0. 

Thus, the extremal probability measure P l is concentrated to those points 
where the optimal supporting function L opt touches P. As illustrated in 
Fig. 4.1(a) and (b), the optimal dual solution is uniquely given by (4.6) if 
P is strictly convex at E(r]). However, consider a convex neighborhood U of 
E(rj), and assume that P is linear affine on U (cf. Fig. 4.1(c)). Then, the 
unique optimal supporting function L opt touches P on the entire neighbor- 
hood U. Any probability measure which preserves the true expectation value 
and whose support is covered by U minimizes (4.5). Nevertheless, the dis- 
crete measure P l , as specified in (4.6), always constitutes a possible solution. 
Remarkably, P l is independent of the underlying function P. Combining the 
above results, we end up with the well-known Jensen inequality. 

[ P{r))dP(r}) > [ L opt (r])dP{ri) = [ P(r])dP l (ri) (4.7) 

Je Je Je 

Equality of the second and the third term is due to strong duality. Notice that 
in most textbooks a slightly different notation is used: E(P(r])) > P(E(r])). 



4.2.2 Edmundson-Madansky Inequality 

In a next phase we focus on a degenerate saddle function which is independent 
of the stochastic parameter rj. Concretely speaking, we attempt to approxi- 
mate a concave function P(£) from below with hyperplanes. In complete anal- 
ogy to the previous section, the set of feasible supporting functions is defined 
as 



C:={L\ L($ = <«,€>-/*< *(€) V£ G E } 

and contains all linear affine minorants of P. As usual, the marginal measure 
P r on (E, 13(E)) is given by P r (A) P(0 x A), A G 13(E). However, P 
and P r shall be identified below, and the superscript ‘r’ is omitted. The best 
minorant with respect to this probability measure is determined by means of 
a semi-infinite linear program. 

sup [ L(£) dP(£) = sup [ (a,£) -ft dP(£) (4.8) 

Ij £ Cj J jzt ol ,f3 J 

s.t. <a,£) -/?<*(€) V£e 5 

It is intuitively clear that the optimal hyperplane L opt intersects P at the 
vertices of the simplex E; this implies that «?(6^) = L opt (6 M ) for all fi = 
0 as sketched in Fig. 4.1(d). 3 Moreover, the primal solution is most 
conveniently expressed in terms of the classical bary centric weights. 



3 For a rigorous proof we refer to [41, Sect, 13]. 
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Fig. 4.1. (a) If <P('ri) is strictly convex and differentiable, the primal and dual 
solutions of (4.4) are unique, (b) In the presence of a sharp bend at E(rj), every 
subgradient in d<P(E(r])) corresponds to an optimal supporting function ( shaded 
region). However, the extremal measure P l remains unique, (c) Conversely, if &(r]) 
is linear on a neighborhood U of E(rj), uniqueness of the dual solution is no longer 
assured, (d) As argued in Sect. 4.2.2, the extremal measure associated with a concave 
function <?(£) is concentrated at the edges of the simplex E 

i opt (^ = E^)v(o 

fi—O 

It is worth to be mentioned that L opt is unique unless the simplex E is de- 
generate. Little surprisingly, to the linear program (4.8) we can assign a dual 
moment problem. 



s.t. 



inf J^mdQit) (4.9) 

Q is a probability measure on E with J £dQ(£) = E(£) 
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measure P l vanishes everywhere but at the edges of E, where the optimal 
hyperplane L opt touches the concave function P>. Thus. P 1 reduces to a linear 
combination of Dirac measures. 

L 

P l = Y,^(E(0)K ( 4 - 10 ) 

\L= 0 

Observe that P l is - as in the previous section - independent of P. Generically, 
(4.10) is the unique solution of the dual problem (4.9). Though, if P is linear 
affine at least on one straight line between two arbitrary vertices, then there 
exist additional extremal measures different from P l . Inserting (4.10) into 
(4.9) yields the classical Edmundson-Madansky inequality. 

jj{£)dP{Z)> [ L opt (Z)dP(Z) = jj(Z)dP\Z) 

In the next section we combine both the Jensen and Edmundson-Madansky 
inequalities to construct an optimal bilinear approximation for any arbitrary 
saddle function P(r],Z). 



4.2.3 Lower Barycentric Approximation 



Consider a subdifferentiable saddle function #( 77 , £) on a regular cross-simplex 
<9 x E. In analogy to the previous sections, we aim at approximating <P from 
below. Since P> is convex along 0 and concave along E, it is reasonable to 
study specific supporting functions which are linear affine both along 0 as 
well as along E. Thus, the space of feasible supporting functions is defined as 



£ := 







1 
1 n 




<%€)V(«?,{)e0xS 



}■ 



Obviously, C contains all bilinear affine minorants of P, and each of its el- 
ements is uniquely characterized by a matrix C € jj(if+i)x(L+i). For the 
further argumentation it is convenient to give an alternative characterization 
of £. Consider the set of functionals 



L, 

C-={l I L(i 7 , £) = E (<«/..*») - h) V (^’ 0 e 0 x . 

( j ,= 0 

<a M , 77) - > P{ 77, b^) V 77 e 0 , n = 0, . . . , L j . 



Below we shall argue that £ coincides with the feasible set £. By definition 
of the barycentric weights, any element of £ can be written as 
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2 *},(£) (< 0 ^, 11 ) -/?„) 

/i=0 



V 


r 


'-A)-- 


■ -/?// 


T -1 


'1 


V 




CKO ■ • 


■ C* L 




€ 



=:A 



Consequently, each element of £ is uniquely determined by some matrix A e 
anc j corresponds to an element in £ with C = AT -1 . Since T is a 
regular matrix, backtransformation is straightforward, A = CT. Furthermore, 
it can easily be seen that £ contains only minorants of 

L 

Lfat) = 53' 7 /*(£)«“*.» 7 f> -0 m) 

/x=0 

L 

< ^2 t m (£) ^(? 7 , bfj) (by definition of £) 

fj,=0 

L 

< t m(^) = ^K 7 ?. 0 (concavity of <P in £) 

fj=0 



Collecting the above results, it is obvious that £ contains only bilinear affine 
minorants of <P, £ C £. Conversely, from the definition of £ it is obvious that 
£ C £; this observation proves equality of the two sets under consideration. 
Let us now turn towards the determination of the best bilinear affine minorant 
of with respect to the probability measure P. To this end, we study again 
a semi-infinite linear program. 

sup/ L(r],£) d,P(r],£) (4.11) 

L(z£, J 0 x H 

In order to solve (4.11), we rewrite its objective function with the help of some 
auxiliary probability measures P ^ being defined on 0 x {b^}, respectively. 

f L(rj,£) dP(r),£) = f ]T r M (^)((a M , rf) - 0^) dP(rj, £) (4.12) 

J&xx Jexs M=0 

= X^s(m) / ««**.»!) -ft.) [ 7,0 

^=0 'e ££ gfW 

= :dP„(? 7) 

Thereby, a sequence of £ + 1 nonnegative pseudo-probabilities q~(p), p — 
0 ,...,£, has been introduced. 

9s (m) : = / £,(£) dP(r),£) > 0 

Jexs 

Since the barycentric weights r M sum up to unity, we find 9s (m) = 1- 

Moreover, by the definition of qs{p) the auxiliary measures are normalized. 

Substitution of the rearranged objective function (4.12) into (4.11) yields 
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sup/ L(r),£) dP(r},£) 

Lee Jexs 
L 

= ^Qsin) sup 

fjL—0 

s.t. v)~PiJ, < ®(V, M Vr? G e. 



L 






Hence, for every vertex b tl of S we have to find the optimal minorant of 
$(■, bfj) with respect to the auxiliary measure P fl . This problem as well as 
its dual counterpart have already been solved in Sect. 4.2.1. Instead of the 
unbiased expectation value E{rf) considered there, we are now supposed to 
work with the so-called generalized barycenters tj^, fi = 0, . . . , L. 

[ V dP^rf) = Y, a » / ^ (4-13) 

Je J&xe QsW 

V-V y / 

=K(v^) 



Apparently, integration of the generalized barycentric weights over the aug- 
mented probability space and normalization with q~(n) yields the classical 
barycentric weights of the generalized barycenters. Remember that the gen- 
eralized barycentric weights are bilinear affine in the outcomes r] and £, 
and therefore the barycenters r are completely determined by the cross- 
moments of the stochastic parameters rj and £. For later use we define 
m G R (JC+1)x(L+1) 

as the matrix of cross moments, and its entries are de- 



noted by To; 



I'll) 



0, . . . , K and /r = 0, . . . , L. 



m 



f 


'1' 




’1 


[I] 

X 


V. 




A. 



dP(th£) 



Notice that moo = 1 since P is a probability measure, which distributes unit 
mass on 0 x E. After all, the generalized barycenters provide an explicit 
solution of the linear program (4.11). 



sup 

LeC 



/ 

JGxi 



L{r),£) dP(rf, £) = 

/ i .— 0 



(m) 



(4.14) 



As outlined in the discussion of Jensen’s inequality, the optimal supporting 
function L opt (rj, £) is not necessarily unique since the subdifferential of <!>( ■, b /x ) 
may contain more than one subgradient at 77 ^. 

Like in the context of the Jensen and Edmundson-Madansky inequalities, it 
is worthwhile to investigate the dual counterpart of the linear program (4.11). 



inf f <P(» 7 ,£)dQ( 77 ,£) 

Q J&xs 

s.t. Q is a probability measure on O x E with 



L 



rii rir 



exH L-7J l*J 



dQ{r},£)=- 



(4.15) 
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The dual program (4.15) constitutes a minimization problem over all probabil- 
ity measures on 0 x E preserving the cross moments m ViL . Taking account of 
complementary slackness, we make a plausible ansatz for the optimal solution 
P l of the moment problem (4.15). 

L 

^ = 1 ^)^)' ( 4 - 16 ) 

fi — 0 

Thus, P l has a discrete support, supp P l = {{r )^ , b M ) j/r = 0, . . . , L+l}, and the 
probability associated with a specific atom ( 77 ^, 6 ^) is given by q~(p i). Much 
like in the previous sections, P l is concentrated to those points where the 
optimal bilinear supporting function L opt touches the saddle function By 
plugging (4.16) into (4.15) and comparing with (4.14) it can be verified that P l 
is in fact a valid solution of the dual problem. 4 Notice that P l is completely 
determined by the cross moments m„ M . A fortiori, it is independent of the 
saddle function <£. In summary, we can derive a useful inequality that closely 
resembles the inequalities by Jensen and Edmundson-Madansky. 

J$(ri,S)dP{r,,t)> j L°*(r,,t)dP(ri,t) = j $(v,t)dP l (v,t) (4.17) 

The right hand side of (4.17) is called lower barycentric approximation of 
the expectation functional (E$) := f $(r},£)dP(ri,£). In fact, the discrete 
extremal probability measure P l and the lower barycentric approximation are 
crucial ingredients of the scenario generation method for multistage stochastic 
programs which will be presented below. 



4.2.4 Upper Barycentric Approximation 



By symmetry, the problem of finding an optimal upper bound for (E&) can 
easily be reduced to the problem studied in the previous section. As the argu- 
mentation remains essentially the same, it is sufficient to present the results. 
For instance, the set of all bilinear affine majorants of ^ is given by 



K 



U:={U | U( V ,Z) = J2 (<«,,£> - &) Vfa.O € O x . 

i /=0 

(a„,£) -p u > &(a v ,£) e E , v- 0,..., Alj . 



The best majorant of the saddle function is evaluated by means of the 
following semi-infinite linear program: 



inf / U(t),£) dP(r],£). 

ueu J&xS 



(4.18) 



4 More details are provided in [42, Sect. 14]. 
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The corresponding dual problem reads 



s.t. 



sup [ &(v,€)dQ{v,€) 

Q J&X a 



Q is a probability measure on 0 x S' with 

r 

d Q(v,£) = m - 



r 


V 






X 

til 


V. 




€. 



(4.19) 



As in the previous section, we introduce a sequence of nonnegative pseudo- 
probabilities qeiv). 



q 0 (v) := f X v {rj) dP(r},£) > 0 

JexE 

In analogy to (4.13), the generalized bary centers are defined as 



== 



=tJS„) 



dP{v,t)- 



The barycenters as well as the pseudo-probabilities only depend on the cross 
moments m vll , and they completely determine the extremal measure P u which 
solves the moment problem (4.19). 



K 

( 4 - 2 °) 

i /=0 

The discrete support of P u is given by suppF“ = {(a„, £ v )\v = 0, . . . , AT + 1}, 
and the probability associated with a specific atom (a u , £„) amounts to qo{y). 
By strong duality, we end up with the requested inequality providing an upper 
bound for the expectation functional. 

J $(r!,t)dP(ri,t) < I U^faQdPfat) = I <t>{r, ,t)dP u ( V ,t) (4.21) 

We refer to the right hand side of (4.21) as upper barycentric approximation 
of the expectation functional 



4.3 Partitioning 

Depending on the curvature of the saddle function <P and the range of the 
random vectors r] and £, the estimates (4.17) and (4.21) for the expectation 
functional ( E<P ) may be very coarse. If these estimates are unsatisfactory, the 
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(°o,£ 0 ) (ai,£x) (t} 0 , bo) (Vi,bi) 

Fig. 4.2. Bilinear affine approximations of a saddle function over a two-dimensional 
cross-simplex <9 x E 



probability measure P can be represented as a convex combination of specific 
probability measures with a smaller support. Then, barycentric discretization 
is applied to each of these measures separately. Formally speaking, let Pj ti 
be a regular probability measure on the augmented probability space ( 0 x 
S, B{0 x S)) for alH = 1, . . . , I(J) such that 

i{j) i(J ) 

p = E E £, ' 7 ’ i = 1 ) and > o Vz = 1,...,J(J). (4.22) 

i = 1 2=1 

The decomposition (4.22) is termed a partition of the measure P, and J will 
be referred to as the refinement parameter. Moreover, for every admissible 
combination of J and i let Qjj x Sj it C <9 x S be a regular cross-simplex 
which covers the support of Pj^. Then, each probability measure Pjs can be 
discretized via barycentric approximation developed in the previous sections. 
Concretely speaking, there are two discrete extremal probability measures Pj i 
and Pji on the Borel space {0j,i x Ej^, B(Oj^ x Ej,i)) with the same cross 
moments as the original measure Pj ^ such that 

J ^i)dP l Jti {i 7 , 1 ) < J $(f 1 ,Z)dPj' i (r,,£) < j ^(v,0dPUv,0 

for all subdifferentiable saddle functions The discrete measures Pj i and Pj i 
are the solutions of the moment problems (4.15) and (4.19), where P and 0 x S 
are substituted by Pjj and Oj j x Ej^, respectively. A partition of the form 
(4.22) always exists. For instance, suppose that {Oj : i x Ej t i | i = 1, . . . , J(J)} 
is a disjoint set-partition of the augmented probability space 0 x E, i.e. 
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H J ) ^ 

& x S - y x Sj t i, (0 Jti x S Jti ) fl x Sjf) = 0 



Without much loss of generality, we may assume that O x has non-zero 
probability mass, and its closure represents a compact regular cross-simplex. 5 
Then, the probability measure Pj ^ can be defined through 



PjM) 



P(A n (0j ti x Ej ti )) 
x 5j,i) 



VA e B{0 x 3). 



Furthermore, set g Jyi := P(0j,i x Ejj) and 0j ti x Sj, » := cl (0jj x S^i). 
These specifications are apparently in accordance with the general require- 
ments (4.22). In general, for every partition of the form (4.22) one can define 
two discrete measures 



i(j) i(J) 

Pj-=T,BJ,iPj.i and P J ■= H 

i = 1 i= 1 



on the augmented probability space 0 x 3. The following chain of inequalities 
is straightforward from the construction of the extremal probability measures 

Pj,i “d Py 4 . 

J $MdP l jM < j $(r,,£)dP(ri,t) < f &{ r h£)dPj('r,,£) 

One may generate an ‘improved’ partition of the probability measure P by 
partitioning some or all components P/,* of some initial partition indexed by 
the given parameter J. The nested partition, which will be indexed by J + 1, 
is usually referred to as a refinement of partition J. By successively refining 
the partition of P, thereby increasing the number J(J) of components, one 
can construct two sequences of discrete probability measures {Pj}ueN and 
{Pj} JeN approximating the original measure P. 



Proposition 4.1 (Monotonicity). Both sequences {Pj}j € ^ and {Pj}je n 
corresponding to a successively refined partition of the original measure P are 
monotonous in the following sense: 

[ <P(v,t)dP l j(ri,i)> f *{v,t)dP l Ar,,i) VJ > J' 

J&xE J&x ^ 

and 

[ ${v,t)dpy(ri,e)< f $(v ,OdPy,(r,,Z) VJ > J' 

J&XX J &XJZ 

for all subdifferentiable saddle functions <P on 0 x S . 

5 Such set-partitions are very popular in stochastic programming literature, see 
e.g. Huang et al. [56], Birge and Wets [11], or Kail et al. [45,60]. 
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Proof. (Cf. Frauendorfer [42, theorem 16.2]) Without loss of generality, we 
investigate the sequence of lower probability measures { Pj}j e n- Furthermore, 
it suffices to prove the assertion for the special case I(J') = 1. Then, P l j, is 
given by the extremal measure (4.16) and corresponds to a trivial partition 
of the original measure P. In the remainder, we shall make use of the strong 
duality result discussed in Sect. 4.2.3. As usual, set 

C := {L j L is bilinear affine on 0 x S, L < $ on <9 x S'), 

and for alii = 1, . . . , I ( J ) define 

Cj^ ■.= {L | L is bilinear affine on <9 x E, L < on O j tl x 

Then, the expectation of any subdifferentiable saddle function $ with respect 
to the probability measure Pj, corresponds to the optimal value of the fol- 
lowing semi-infinite linear program 

I ®('n,Z)dPj'{ r h€) = sup [ L(rj, £)dP(t], £). (4.23) 

J L&C J 

Similarly, the expectation of $ with respect to Pj amounts to 

r ^ r 

${v,€)dP l j(ri,Z) = ^2ej,i sup Lj 4 (r},Z)dPj 4 (ri,€). (4.24) 

J i=l L J 

In a next step we have to show that (4.23) is smaller or equal to (4.24). To 
this end, assume that L is a bilinear affine functional feasible in (4.23). Then, 
for every index i set Lj t i := L. One can easily verify that these functionals are 
feasible in (4.24) and have the same objective function value as L in (4.23). 
Thus, expectation of $ with respect to Pj is at least as large as expectation 
with respect to Pj, , yielding monotonicity of the lower expectation functionals 
due to refinements. An analogous argument holds for the sequence of upper 
measures. □ 

Definition 4.2. A sequence {Pj}j^n of probability measures on the aug- 
mented probability space ( <9 x S,B(0 x S)) is said to converge weakly to 
a probability measure P if and only if 

lim [ $(ri,£)dPj(r],€) = [ ^(r7,^)dP(rj,g) 

JexS 

for all continuous functions on 0 x S . 



Proposition 4.3 (Convergence). The sequences {P l j}jew and {Pj}jeN 
converge weakly to the probability measure P provided that the diameters of 
all cross-simplices {0j,i x £j ti | * = 1, . . . ,/(«/)} become arbitrarily small when 
J is increased. 
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Proof. Without loss of generality we focus on the sequence of lower proba- 
bility measures Choose any continuous function on 0 x E. By 

compactness, <P is uniformly continuous on its domain, i.e. for every tolerance 
e > 0 there exists a real number 5 > 0 such that 

!#(»?,£) < £ V(? 7 ,£)> (*/,£') G 0 x S' 

with || (»?,£) - W,£')\\ < 6. 

Next, choose J large enough to guarantee the inequality 

diam (0 Jyi x Sj ti ) <5 Vi = 1, , I(J). 

Then, we have 

j 0(ri,t)dP Jti (ri,t) - J $(v,t)dP l j 4 (r,,ii) 

< sup £) | ( 77 , € Oj,i x - inf{^(r 7 , £) | ( 77 , e 0 ;,, x S^}. 

The last expression is smaller than e due to uniform continuity of P. This 
simple estimate implies that 

j $( V ,0dP(77,0- / $(rh£)dP l Av,H) 




and therefore Pj converges weakly to the original probability measure P. □ 



4.4 Barycentric Scenario Trees 

In this section we will construct a ‘lower’ (‘upper’) discrete probability mea- 
sure P l ( P u ) on the augmented probability space of a multistage stochastic 
program. By replacing the original measure P with the discrete measure P l 
or P u in (3.1), an auxiliary stochastic program is generated, which allows 
for numerical solution. We study the structural properties of these auxiliary 
optimization problems under specific regularity conditions. In particular, we 
investigate the relationship between the recourse functions of the original and 
the auxiliary stochastic programs. Under certain circumstances, the auxiliary 
problems associated with the measures P l and P u provide lower and upper 
bounds on the optimal value of the original problem. 

Consider a multistage stochastic program on an augmented probability 
space which satisfies at least the basic regularity conditions (Bl) and (B4). 
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Throughout this chapter we will explicitly use the standard assumption that 
the random outcomes at stage 0 are deterministic. Thus, we consider rjo and 
£o as fixed parameters implying that Pq reduces to a Dirac measure. Next, 
for every time index t £ r_o and for any refinement parameter J £ N we 
establish a partition of the conditional probability P t by representing it as a 
combination of suitable transition probabilities. 

h{J) 

PMrf- 1 ,?- 1 ) = E (4.25) 

»«= i 

For any admissible combination of indices t, J, and i t , we require the transition 
probability Pt,j,i t ■ B(@t x £ t ) x <9 t_1 x £ t_1 — » [0, 1] to comply with the 
following conditions (see also Sect. 4.1): 

(i) J Pt,j,t t (-|rj t-1 ,£ t ~ 1 ) is a regular probability measure on B[Qt x E t ) for any 

fixed outcome history G 0 t_1 x S* -1 ; 

(ii) Pt 1 j I < t (i4|-) is a Borel measurable function on (9 t-1 x for every fixed 
Borel subset A of 0 t x E t . 

Furthermore, the measurable function ■ 0 t_1 x S'* -1 — > R is nonnega- 
tive, and we postulate 

h{J) 

E = 1 Vf £ r_ 0 , J £ N. (4.26) 

*t=i 

In a next step, for every admissible combination of t, J, and i t we define two 
families of closed regular simplices in R^* and R Lt as the convex huh of their 
vertices. 



*) = co \£‘ 1 )\vt=0,---,K t } 

£t,J,i t (v t ~ 1 >£ t ~ 1 )= co{6 Mt ,j )it (t7 t_1 ,C t-1 ) \lH =0,...,Lt} 

Assume that the vertices a Vtt j^ t and j j!t are measurable vector- valued 
functions on 0 t_1 x H t_1 . The Cartesian product of the simplices (4.27) con- 
stitutes a regular cross-simplex 

0t,J,<.(»| t_1 ,£ t-1 ) x C 0 t x ~ t . (4.28) 

Notice that and Et,J,u can be viewed as measurable multifunctions 

since the vertices are measurable single- valued functions. This is a direct con- 
sequence of [91, proposition 1H], Apart from measurability of the vertices and 
the inclusion (4.28), we require 



supp Pt,j,i t {-\rf x ) C Ot^uirf \£ f J ) x \£‘ x ). (4.29) 
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A partition of the form (4.25) with the properties (4.26)-(4.29) always exists. 
In fact, the regularity condition (B4) requires the probability distribution Q t 
of the disturbances (e°, ejj) at stage t to be independent of the outcome history. 
Thus, for every time index t £ t_o and for every refinement parameter J £ N 
the measure Q t can be represented as a convex combination of 7 t (J) regular 
probability measures with a (potentially) reduced support (see Sect. 4.3). 

h(J) h{J) 

Qt= ^2 = 1 Vi € T -0, J e N (4.30) 

it=l it= 1 

Here, Qt,j,i t is a regular probability measure, while Qt,J,u is a nonnegative 
real number for every admissible combination of f, J, and it- Any partition 
of Q t as in (4.30) naturally induces a partition of the conditional probability 
P t as in (4.25). Concretely speaking, given the partition (4.30) we may define 
the transition probability through 

= Qt,j A ({(e?,eJ)|(fl?»/ t - 1 +e° t ,HX t - 1 + e[) £ A}), (4.31) 

where A is an arbitrary Borel subset of O t x5 t , and (t] t ~ 1 , is an arbitrary 
element of the marginal space <9 t_1 x E t_l . Next, define as a constant 

functional equal to the nonnegative real number pt,j,i t - Moreover, for every 
admissible t, J, and it let x ^ t,j,u a re gnl ar cross-simplex contained 

in £° x £ T which covers the support of the probability measure Qt,j,i t ■ Then, 
the multifunctions (4.27) can be defined as 

Obviously, the vertices of these simplices reduce to linear affine and, a fortiori, 
measurable functions of the outcome history, and the inclusions (4.28) and 
(4.29) are satisfied by construction. 

Given any partition (4.25) of the conditional probability P t , we then at- 
tempt to discretize the associated transition probabilities Pt,j,i t separately by 
means of the barycentric approximations. For ease of exposition, we suppress 
the fixed indices J and i t in the subsequent discussion and merely consider 
possible dependencies on time and outcome history. 

For any outcome history (r? t_1 , £ t_1 ), the stage t outcomes r) t and £ t 
can be represented as convex combinations of the vertices in (4.27). Thus, 
the corresponding coordinates, i.e. the classical barycentric weights, depend 
measurably on the outcome history, cf. also (4.3). 

= (At,o(r7tlV _1 ^ t-1 ),---,A t ,ic t (»7tlV _1 ,€‘ _1 )) 




70 



4 Barycentric Approximation Scheme 



Obviously, measurability is inherited by the generalized barycentric weights. 

€ t_1 ) n.M&to*- 1 , £*“ 1 ) 

Having in mind the extremal probability measures constructed in Sects. 4.2.3 
and 4.2.4, we can now define two sequences of pseudo-probabilities. 

QbMv*- 1 ,?- 1 ) ■■= J r^t\v t -\e- 1 )dP t ( Vt ^t\V t -\e- 1 )(^) 

qe t {vt |»7 t_1 ,^ _1 ) := /A,(, t |^ 1 ,^VP t (, t ,^V- 1 ,r 1 )(4.33b) 

The pseudo-probabilities are well-defined, strictly positive, and measurable in 
the outcome history as implied by the measurability of the classical barycentric 
weights, the properties of the transition probability Pt, and the generalized 
Fubini theorem [2, theorem 2.6.4]. In the spirit of the previous sections, we 
define two sequences of generalized barycenters as 






K t 



v t = 0 



1 



qsMri*- 1 ,?- 1 ) 



and 






L t 



Mt=0 

f 'IvuUt (Vu I 7 ? 4 ” 1 ) ^ t_1 ) 



(4.34a) 

(4.34b) 



t _! .t-u ^tfot.Stl *! 4 1 .C t X )- 






By means of elementary arguments it can be shown that the generalized bary- 
centers are well-defined and measurable in the outcome history. Next, we use 
the pseudo-probabilities (4.33a) and the generalized barycenters (4.34a) to 
construct a parametric family of probability measures P{(’|f? t-1 >£ t- ), each 
of which has a finite support 



\£ 4 1 ),b^(ri t 1 ))|/r t = 0,...,T t } 



and associated probabilities qs t {pt |f7 t_1 ,£ 4-1 )- Notice that P/ characterizes a 
discrete transition probability approximating the original transition probabil- 
ity P t . Analogously, we construct a transition probability P“ corresponding 
to the pseudo- probabilities (4.33b) and the generalized barycenters (4.34b). 
In this case, the discrete probability measure P"(- |q t_1 , £ t_1 ) associated with 
the outcome history (rf- 1 , £ t_1 ) has support 
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and the i/ t ’th atom has mass qe t (yt |»7 t_1 j£ t-1 )- Reintroducing the suppressed 
indices J and i t , we denote by P/ j it and Pf Jit the barycentric discretizations 
of the transition probability Ptj,u ■ Then, convex combination yields discrete 
approximates P} j and P t “j for the original conditional probability P t . 

h(J) 

P},A- to* -1 .** -1 ) : = E etJM(r) t ~\e- 1 )PUiMr, t -\i t - 1 ) (4.35a) 

it~ 1 

h(J) 

PtA - : = E (• l^ -1 - £ t_1 ) (4.35b) 

it=l 

Measurability of the coefficient functions guarantees that both Pj: j and Pfj 
can be interpreted as transition probabilities, too. One may improve the ap- 
proximations of the conditional probability P t by partitioning some or all 
components Pt,J,i t °f some initial partition (4.25) indexed by the given pa- 
rameter J. The nested partition, which will be indexed by J + 1, is referred 
to as a refinement of partition J. By successively refining the partition of 
Pt, thereby increasing the number p(J) of components, one can construct 
two sequences of discrete barycentric transition probabilities {P\ j}jen and 
{P“j}j € n approximating the original conditional probability P t . 

By the product measure theorem [2, Sect. 2.6], the transition probabili- 
ties P‘ j and Pf j of stages t € r_o and the degenerate marginal probability 
measures Pq j := Pq =: Pqj can be combined to a unique ‘lower’ and ‘up- 
per’ Borel probability measure Pj and P“, respectively (see also the general 
procedure described in Sect. 4.1). 

Pj-Po,J* P i,J*---* P T,J’ Pj-Po,J*Pi,J*-'-* P T,J (4-36) 

Below, we will refer to Pj and P“ as barycentric probability measures. Ob- 
serve that both barycentric measures have a finite support. Their atoms as 
well as the associated probabilities can be calculated by means of a forward 
recursion scheme, as exemplified in [43]. Furthermore, the transition proba- 
bilities (4.35) can be viewed as regular conditional probabilities corresponding 
to the barycentric measures (4.36). By convention, E\ j(-) and P“j(-) stands 
for expectation under the barycentric measures conditional on information 
available at time t, while the unconditional expectation operators are denoted 
by Ej(') an d Ej(-), respectively. 

A priori, for a fixed refinement parameter J £ N, neither Pj nor P“ 
need be good substitutes for the original measure P in a given stochastic 
program. Occasionally, one might wish to improve the barycentric measures by 
refining the underlying transition probabilities. Thus, we will investigate the 
convergence behavior of the sequences {Pj}jgN and { PJ}j£ n> below. Under 
nonrestrictive conditions both sequences can be shown to converge weakly to 
the original measure P as the parameter J tends to infinity. 
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The underlying refinement strategy is arbitrary to a large extent. We will 
refer to a refinement strategy as regular if the following condition holds: for 
every tolerance e > 0 there is a Jo(e) £ N such that 

diam , § t_1 ) x S^fa*- 1 ,^- 1 )) <e 

vt G T_ o, J > Jo(e), it = 1, , It(J), (v*- 1 , e* _1 ) € O*” 1 X S*" 1 . 

This condition is nonrestrictive. For instance, concentrating on partitions of 
the form (4.30)-(4.32), one can easily construct a refinement strategy which 
satisfies (4.37). Generally speaking, the regularity condition (4.37) ensures 
weak convergence of { Pj}j e N and {P“}jeN to the original probability mea- 
sure P, as argued in the following proposition. 

Proposition 4.4. For any regular refinement strategy the sequences of dis- 
crete barycentric measures {Pj},/ e N and {Pj 1 }./^ on the augmented probabil- 
ity space converge weakly to the original probability measure P. 

Proof. Consider a continuous function F : 0 x E — > R, and define 

j (E t 0) for t < T, 
qt ‘ \ $ for t = T. 

The conditions (Bl) and (B4) imply via the dominated convergence theorem 
that qt is uniformly continuous on <9* x E l for all t € r. Thus, for every 
tolerance e > 0 there is an index Jo(e) € N such that 

\{E t q t +i) — (E l t J q t+1 )\ < e (4.38) 

uniformly on the marginal space G l x E l for all t € and for all sufficiently 
large refinement parameters J > Jq(c'). In fact, proposition 4.3 guarantees that 
(4.38) holds pointwise, and uniformity of the estimate follows from uniformity 
of the assumption (4.37). For the sake of transparent notation, in the rest of 
the proof we suppress the index J > Jo(e) which will be kept fixed. We will 
show by backward induction that 

\{E t $)-(E\$)\<e{T-t) 

uniformly on 0* x E 1 for all t £ r_r- The basis step for t = T - 1 follows 
trivially from (4.38). Next, assume that the claim has been established for 
stage t + 1. Thus, we find 

\im - (E l t $)\ = I (E t (E t+1 <P)) - (E l t (E l t+i m 

< \(Et(E t+ !$)) - (E‘(E t+1 *)) I +e(T-t- 1) 

< e(T — t). 

The first inequality is due to the induction hypothesis, while the second in- 
equality follows from (4.38). As the choice of e was arbitrary, we have con- 
vergence of the unconditional expectation ( E l <P ) to (E$) due to refinements. 
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In turn, as the choice of the continuous function <P was arbitrary, as well, 
weak convergence of the sequence of lower measures is established. An analog 
argument applies to the sequence of upper measures. □ 

By replacing the conditional probability P t in the recourse problem (3.2) 
with the transition probabilities Pf j and P“j, a related lower and upper 
auxiliary recourse problem can be formulated, respectively. For each fixed re- 
finement parameter J G N we introduce a sequence of auxiliary value func- 
tions {$[ j}t£r corresponding to the lower transition probabilities Pj: j and 
an analogous sequence of auxiliary value functions {<?“ j}ter corresponding 
to the upper transition probabilities Pf j . The auxiliary value functions are 
defined by backward recursion. In the last decision stage set 



$ l Tt j(x T 1 ,v T ,£ T ) := su Pa , T p T (x T , r/ T , £ T ), 
®t,j( xT ~ 1 ’V T ,€ T ) ■= sup XT p T (x T ,r] T ,^ T ), 
and for t = T — 1, . . . , 1 define 

W,€*) ; = sup */*.€*). 

$ZA xt ~ 1 P 7*.$*) : = sup,, &(**,»!*,€*) + 



(4.39a) 



(4.39b) 



Finally, the auxiliary value functions of the subordinate first stage problems 
only depend on the random vector (r/o,£o)- 



$o,j(Vo,£o) ■= sup Xo po(x 0 ,r)o,Zo) + (E l 0 ,J®i,j)( x o,Vo,€o) 

*iUfoo,eo) := su Pa!o /3 0 (a:o, T7o, ^o) + (*0,^70, £o) 



(4.39c) 



The expectation functionals are defined in the obvious way as in (3.3); simply 
replace the original conditional probabilities by the upper and lower barycen- 
tric transition probabilities, respectively. Furthermore, the optimal values 

{E l j$ l o,j)-= [ <j(»7o,£o)dPo,j(»7o,£o), 

J &o x Ho 

(Ej$o t j) [ ^j(rio^ 0 )dP^j(vo^o) 

J Go x Ho 

corresponding to the dynamic programs (4.39) are each given by the un- 
conditional expectation of the respective first stage recourse function. It is 
intuitively clear that the barycentric probability measures Pj and Pj can be 
used to establish two auxiliary stochastic programs 



(4.40a) 



sup / dPj(Vi £) 

•'® x ~ t=0 

s.t. f t (x t ,£ t )<0 Pj- a.s. 
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and 



sup 

b€JV“ 



/ 

J0xS 



'^2pL(x t ,TJ t 



dPj(r), £) 



_t=0 



s.t. f t (x t ,£ t )<0 Pj- a.s. ter 



(4.40b) 



The linear spaces j and 7V"“ j contain all non-anticipative policy functions 
which are essentially bounded with respect to the measures Pj and Pj, re- 
spectively. We will refer to (4.40) as the lower and upper auxiliary stochastic 
programs associated with the underlying optimization problem (3.1). As the 
measures Pj and P“ are discrete, the auxiliary stochastic programs (4.40) are 
principally accessible for numerical solution. 



4.5 Bounds on the Optimal Value 

By successively applying the inequalities (4.17) and (4.21) to minorize (ma- 
jorize) the expectation functionals of a stochastic program subject to the 
regularity conditions (B1)-(B5), it can be shown that the optimal value is 
bounded below by (E l j<P l 0t j) and bounded above by {Ef^lfj). Little surpris- 
ingly, these bounds become tendentially tighter as the refinement parameter J 
is increased. Unlike the true optimal value, the bounds are computationally ac- 
cessible, as their calculation merely requires the solution of finite-dimensional 
stochastic programs. Moreover, the difference (Ej $ q j) - ( E l j$ l 0 j ) can be 
viewed as a measure for the ‘appropriateness’ of the auxiliary stochastic pro- 
grams (4.40) to approximate the original problem (3.1). These ideas will be 
formalized in the remainder of this section. 



Proposition 4.5. Consider a stochastic program of the form (3.1), which sat- 
isfies the fundamental regularity conditions (B1)-(B5). Then, the following 
hold for all J £ N: 

(a) the transition probabilities P l t J and Pfj are well-defined for t G r (but 
not unique) and define two discrete probability measures Pj and Pf on 
the measurable space (0 x E,B(0 x H)); 

(b) the auxiliary stochastic programs ( 4 - 40 ) satisfy the regularity conditions 

(Al)”-(A5)”; 

(c) the auxiliary value functions d?\ j and are concave in the decision 
vector x*- 1 for all t e t_ q. 
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Proof. Assertion (a) follows from (Bl), (B4), the construction of the transi- 
tion probabilities P[ j and -P“j, as well as the product measure theorem [2, 
Sect. 2.6]. Assertion (b) is straightforward. By construction, the auxiliary 
stochastic programs (4.40) are subject to the conditions (Bl), (B2), (B3), and 
(B5). However, they do not necessarily satisfy condition (B4). Thus, arguing 
as in corollary 3.15, one shows that the auxiliary value functions j and 
j are concave in the decision variables on the entire underlying space. This 
observation implies assertion (c). Due to failure of assumption (B4), subdif- 
ferentiability as well as the saddle property can not be established. □ 

Proposition 4.5 (b) implies that the static and dynamic versions of the 
auxiliary stochastic programs (4.40) are solvable and equivalent. Furthermore, 
the auxiliary recourse functions are use and bounded on their natural domains. 
In view of the results in proposition 4.1 one might expect monotonicity of the 
auxiliary recourse functions upon increase of the refinement parameter. This is 
in fact provable provided that the auxiliary stochastic programs (4.40) comply 
with the regularity condition (B4); see proposition 4.6 below. However - as 
remarked in the proof of proposition 4.5 - condition (B4) usually fails to be 
true. Consequently, monotonicity fails to hold in general. 

Proposition 4.6 (Monotonicity). Consider two sequences of lower and up- 
per auxiliary stochastic programs of the form (4-40), each of which satisfies 
the fundamental regularity conditions (B1)-(B5). Then, for allt £t we have 

and T>Ij < Pfj, on Z t VJ > J' . (4.41) 

Proof. The inequalities (4.41) are shown by backward induction with respect 
to the index t. By (4.39a) we have < P l T J = $t,j' and ®tj— j 1 f° r 
refinement parameters J, J' € N. Thus, the basis step is obvious. Then, assume 
that (4.41) has already been established for stage t + 1. As the auxiliary 
stochastic programs are subject to the conditions (Bl) - (B5), we may conclude 
by the theorems 3.14 and 3.18 that ( P[ j and Pf j are subdifferentiable saddle 
functions on Z l for all ( t , J) £ r x N. Thus, for all J > J' we find 

> (Kj* ) > (Kj'* t+l,J') 

,j) < ,j>) 

on the natural domain Y L . The first inequalities on both lines are due to the 
induction hypothesis, while the second inequalities follow from proposition 4.1 
as well as the saddle property of <?| +1 j, and 'F“ +1 Jr , respectively. An elemen- 
tary argument then proves (4.41) for stage t, which completes the induction 
step. □ 

The assumptions of proposition 4.6 are fulfilled e.g. if the underlying origi- 
nal stochastic program complies with the regularity conditions (B1)-(B5), and 
for each (t, J) £ t_q x N the J’th partition of P t is of the form (4.30)-(4.32). 
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Below, the crucial theorem 4.7 clarifies under what conditions the auxiliary 
stochastic programs (4.40) provide lower and upper bounds on the optimal 
value of the original stochastic program (3.1). 

Theorem 4.7 (Bounds on the Recourse Functions). Consider a multi- 
stage stochastic program, which satisfies the conditions (B1)-(B5). Then, <Pt 
is squeezed between the auxiliary value functions, i.e. we have j < $t < 
on Z l for all ( t , J) 6 rxN. Similarly, the optimal value of the original problem 
is bounded: 

(E l j$aj) < (E $ 0 ) < (Ef$lj). 



Proof. For the sake of better readability we may suppress the index J, which 
is kept fixed in this proof. Theorem 3.14 and theorem 3.18 imply that the 
recourse functions are subdifferentiable saddle functions on a neigh- 

borhood of their natural domains. This notion is important for the present 
theorem, which will be proved by backward induction. By definition we have 

*ir(x T ~W, Z T ) = $T(x T -\r, T , f) = $ u T {x T -\r, T , i T ), 



and thus the basis step is established. Let us now assume that we have already 
shown $[+1 < $t+ 1 < <P“ +1 on Z t+1 . By using the barycentric approximations 
we can easily derive a similar chain of inequalities for the expectation func- 
tionals. 



j $[+i(x t iP t+1 ,£ <:+1 )dPl +l (vt+i,£ t +i\'n t ,Z t ) 

< J $t+i(x t ,r ] t+ \€ t+1 ) dP‘ + 1 (rit +1 , 

< j <P t+l {x t ,r 1 t +\e +1 )dP t+1 {r lt+l ^t +1 \r 1 t ,i t ) 

< J $t + i(x\v t+ \e + 1 )dP? + 1 (v t+ i,Zt + i\ri t ,Z t ) 

< / #t+i (**> ^7 t+1 > £* +1 ) dPt+iiWt+nZt+i I V. O 

The first and the fourth inequalities follow from the induction hypothesis, 
whereas the second and the third inequalities apply because of (4.17), (4.21), 
and the construction of the transition probabilities P^; make also use of 
the saddle structure and the subdifferentiability of <P t + 1 - Comparison of the 
objective functions in (3.2b) and (4.39b) entails 



*t(s * * W>£*) <#*(** l ,r) t ,i t ) <^(x l 1 , rf , £ 4 ) . 



This statement completes the induction step. □ 

Theorem 4.7 is a combination of lemma 4.2 and theorem 4.1 in [43]. Sub- 
sequently, we investigate the convergence properties of the auxiliary value 
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functions (4.39). As a key ingredient we will need condition (4.37), which ba- 
sically ensures weak convergence of the sequences {Pj}j 6 N and {PjjjeM to 
the original probability measure P. The following theorem is motivated by 
the results in [44, Sect. 5]. 

Theorem 4.8 (Convergence). Consider a multistage stochastic program 
subject to the regularity conditions (B1)-(B5). Then, the sequences {<P l t jjjeN 
and converge to d> t uniformly on Z * for all t € r provided that the 

underlying refinement strategy is regular (see (f.37)). 

Proof. By theorem 3.18 the recourse function <P t is uniformly continuous on 
its natural domain Z l for all l£r. Thus, for every tolerance £ >0 we find 

~ (E\,j$ t+ 1) < £ (4.42) 

uniformly on Y* for all t £ t_t and for all sufficiently large refinement pa- 
rameters J > Jq (c). In fact, proposition 4.3 guarantees that (4.42) holds 
pointwise, and uniformity of the estimate follows from uniformity of the as- 
sumption (4.37). For the sake of transparency, in the rest of the proof we 
suppress the index J > Jo(e), which will be kept fixed. We will argue that 

0 < $t -&[<£ (T - t) 

uniformly on Z l for all t € r. The claim is established by backward induction. 
The basis step is trivially fulfilled since, by definition, equals ( P l T on the 
entire underlying space. Next, assume that <Pt+i — <P l t+ j < e (T — t — 1). Then, 
we may conclude that 

*[(**- w , ?) = sup p t (x\i f, ?) + (^ t+1 )(®‘, ri, e) 

Xt 

> sup + {E l t $ t+1 )(: c‘, rf, £*) - e (T - t - 1) 

X t 

> swpp t (x t ,r) t ,^ t ) + (E t P t+1 )(x t ,ri t ,i t ) -e(T-t) 

X t 

= * t (x t -\r, t ,t t )-e{T-t) 

uniformly on Z l . The first inequality is due to the induction hypothesis, while 
the second inequality follows from (4.42). Furthermore, by theorem 4.7 we 
find — 0 uniformly on Z L , which completes the induction step. As the 

tolerance e was arbitrary, convergence of the lower sequence {<&[ to the 

original recourse function <P t is established. An analog argument applies to 
the upper sequence □ 



4.6 Bounding Sets for the Optimal Decisions 

In the previous section we developed computable bounds bracketing the op- 
timal value of a stochastic program which satisfies the regularity conditions 
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(B1)-(B5). In practice, however, a decision maker is not only interested in the 
optimal value but also in the optimal policy of a given stochastic optimization 
problem. In the sequel, we will argue that bounds on the recourse functions 
entail bounding sets for the optimal first stage decisions. Our exposition is in- 
spired by Pflug [84, Sect. 1.3]. 6 As before, it is convenient to use the standard 
assumption that the random outcomes at stage 0 are deterministic. Thus, one 
may consider ijo and £o as fixed parameters implying that 

(E0 o) - MVo, £o), (E l j$ l 0 ) = o, £o), (Ej$o) = *o,j(»7o, Co). 

Next, introduce three extended-real-valued functionals 

F(x 0 ) := Po{x Q , rf 0 , £ 0 ) + (E <2>i)(® 0 , rj 0 , £ 0 ), 

Ej(x 0 ) := po( x o, Vo, Co) + (Ej$ l hJ )(x 0 , Vo,Zo), (4.43) 

Fj(x o) := p 0 (x 0 , rjo, Co) + (Ej&i' j)(x 0 , Vo, Co)- 

We assume that the auxiliary functionals Fj and FJ can numerically be evalu- 
ated, whereas F is computationally untractable. Moreover, assume the under- 
lying stochastic program to satisfy the conditions (B1)-(B5). Then we have 

(E$ o) = supF(cco), 

X 0 

(E l A,j) = supFj(xo), 

Xo 

(EJ$Zj) = sup Fy(xo), 

Xo 

and Fj < F < Fj on M n °. Therefore, the functional F is squeezed between 
the auxiliary functionals Fj and FJ on the entire underlying space. This 
observation implies that the maximizers of F are necessarily contained in the 
set of all cco such that Fj(xq) is greater or equal to the maximum of the lower 
bounding function Fj (see Fig. 4.3). Formally, one may write 

argmaxF C Cj '■= {cco I Fj(xq) > sup A 1 }}. (4.44) 

It is obvious from the definitions that the upper level set G j also contains the 
maximizers of the auxiliary functionals (see Fig. 4.3), i.e. 

arg max Fj U arg max Fj C Cj. (4.45) 

Moreover, implementation of any first stage decision in arg max Fj results 
in an expected profit of at least supF*, provided that all recourse decisions 

6 In that reference, three lsc functions F, F l , and F u are considered. These func- 
tions are subject to specific conditions. First, F is squeezed between F l and F u , i.e. 
F l < F < F u . Moreover, the minima of F l and F u are finite and known, and F l 
satisfies a suitable growth condition. Then, the distance between the minimizers of 
F and F l can be estimated. 
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are chosen optimally with respect to the original probability measure. Notice 
that Cj represents a bounding set for the optimal decisions at stage 0 as it 
contains all maximizers of F. Moreover, Cj is compact. This follows from 
the observation that Cj is an upper level set of Fj and thus contained in 
the first stage feasible set Xo(£o). The level set Cj can principally be found 
by scanning the space of first stage decisions. In fact, one can always decide 
whether some xq € R"° lies within C J by simply calculating Fj(x o) and 
comparing the result with sup Fj . Thus, scanning requires maximization of 
Fj and repeated evaluation of Fj at specific reference points, both of which 
are practicable operations. However, unsystematic scanning is very inefficient 
for no > 1. In case of a multidimensional decision space one should exploit 
structural properties of the functionals (4.43). By assumption, the underlying 
stochastic program complies with the regularity conditions (B1)-(B5). Then, 
proposition 4.5 ensures concavity of Fj and FJ. In particular, FJ(x o) can be 
interpreted as the optimal value of (4.40b) given that the first stage decision is 
fixed at cco. Any Lagrange multiplier associated with this extra constraint in 
(4.40b) - i.e. the equality constraint which fixes Xq - represents a subgradient 
of FJ at xq. Subgradient information of this type might be useful for an 
efficient numerical evaluation of the level set Cj. 




The above analysis can be further simplified by concentrating on one sin- 
gle component of the first stage decision vector. Denote by F l Jk (x o,*) and 
Fjk( x o,fc) the optimal values of the auxiliary stochastic programs (4.40a) and 
(4.40b), respectively, and let Fk(x o,k) be the optimal value of (3.1) under 
the additional assumption that the fcth component of xq is fixed at Xq By 
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definition, we find 



F k (x o, fc ) = sup{F(a;o) | x' ok = x 0 ,k}, 

*j,*(*o,k) = su p { f j ( x ' 0 ) I = *0,*}, 

F j, k ( x o,k) = sup{F“ (ccg)| x' o k = x 0 ,fc}, 

and F l Jk < F k < F f k on R for all k — 1, . . . , no- Moreover, we have 

(E$ 0 ) = supF fc (a;o ) fe), 

Xo ,fc 

(Ej^j) = sup F y k (x 0 ,k), 

XO,k 

(Ej<&o,j) — SU P F j k (xo,k) ■ 

X0,k 

The maximizers of F k can be estimated in analogy to (4.44). 



arg max F k C Cj, k ■= { 2:0 | Fj,k( x o) > sup F l Jk } (4.46) 

However, (4.46) has the advantage that F l Jk and Fj k are functions of one 
single real variable only. 7 Upper and lower bounds on the optimal x 0 ,fc are 
thus provided by the two zeros of the concave function Ff k — sup F l j k , which 
can conveniently be calculated by means of Newton-type methods. 

Now, let us return to the multidimensional bounding sets Cj introduced in 
(4.44). As argued in the following theorem, the sequence {Cj},/ 6 n converges 8 
to the set of maximizers of the functional F. 

Theorem 4.9 (Convergence). Consider a multistage stochastic program 
subject to the regularity conditions (B1)-(B5). Then, the sequence {C/},/^ 
converges to arg max F provided that the underlying refinement strategy is 
regular in the sense of (4-37). 

Proof. By definition of the bounding set Cj it is clear that 

arg max F C Cj C dom F VJeN, (4.47) 

where domF coincides with the first stage feasible set Xo(£o)- In addition, 
we will prove the following implication: 

4- ar S max F => there is a neighborhood U of Xq and (4.48) 
J 0 eN such that U CiCj = 0 VJ > Jo. 

7 Notice, however, that the intervals {Cj^Y^Lx do not provide full information 
about the bounding set Cj. In contrast, Cj is usually a strict subset of the Cartesian 
product 

8 For a survey on the theory of set convergence see [97, Chap. 4], 
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In fact, given a fixed decision xq in the complement of arg max F, we will 
show that some neighborhood U of xq lies in the complement of Cj for all 
sufficiently large refinement parameters J. As Xq is no optimizer of F, and by 
closedness of arg max F, there is a compact neighborhood U of xq such that 

U fl arg maxF = 0. 

Upper semicontinuity of F and compactness of U then ensure the existence 
of an e > 0 with 



F(x q) < supF — 2s Vxq € hn domF. 

Moreover, by theorem 4.8 the sequences {F l j}j e j$ and {Fy},/^ converge to 
the functional F uniformly on dom F. Consequently, there exists a refinement 
parameter Jo € N such that 

Fj — F < s and F — Fj <e 

on domF and for all J > Jq. Combining the above inequalities we obtain 

Fj(x o) < F(ccq) + £ < sup F - £ < sup Fj Vseq € U n domF. 

Then, by the definition (4.44) of the bounding set Cj, it is immediately clear 
that the set U lies in the complement of Cj for all J > Jq. Thus, (4.48) 
follows. In summary, we may conclude that 

arg max F C lim inf Cj C lim sup Cj C arg max F, 

J — >oo J — >00 

where the first inclusion follows from (4.47), and the third inclusion follows 
from (4.48). Thus, the limit of the sequence {Cj} j € n exists and coincides with 
the set of maximizers of F. This observation completes the proof. □ 

Corollary 4.10. Let {x$ j}jeN be a converging sequence inW 10 with limiting 
point Xq. Moreover, assume that Xqj£ arg max Fj for every J £ N. Then, 
Xq is an element of arg max F. The same holds true if we assume that Xq j £ 
arg max F“ for every J € N. 

Proof. The assertion follows immediately from the inclusion (4.45) and the 
fact that the bounding sets {Cj}j & ^ converge to arg max F. □ 

In [42, Sect. 18] the statement of corollary 4.10 is derived for the two-stage 
case by using the concept of epi-convergence due to Attouch and Wets [3]. The 
approach presented here provides sharper results, as we know for sure that 
arg max F is covered by the compact bounding set Cj for every J e N, and 
the sequence {Cj}j 6 n converges to arg max F. 
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Extensions 



Many decision problems under uncertainty, which are of economic or techni- 
cal interest, can be formulated as multistage stochastic programs. As we have 
seen in the previous section, such optimization problems are conveniently dis- 
cretized by means of the barycentric approximation scheme. Loosely speaking, 
the most crucial prerequisites for barycentric approximation to yield upper 
and lower bounds on the recourse functions are: 

(a) the given stochastic program is a convex optimization problem (without 
loss of generality, we concentrate on maximization problems); 

(b) the profit functions are convex in the stochastic parameters; 

(c) the constraint functions are jointly convex in the decision variables and 
the stochastic parameters; 

(d) the stochastic parameters follow a block-diagonal autoregressive process. 

Although the first requirement seems to be fairly restrictive, many optimiza- 
tion problems of practical relevance are actually convex. However, the remain- 
ing conditions (b)-(d) severely limit the scope of the barycentric approxima- 
tion scheme . 1 In an economic context, for instance, decision problems involv- 
ing lognormally distributed prices and demands, derivative trading, or risk- 
aversion do not meet the requirements (b), (c), and (d) simultaneously. Some- 
times, these conditions can be enforced by redefining the random parameters 
and appropriately transforming the underlying probability distributions . 2 But 
if the random variables are serially correlated, this approach generally fails. 
From an applied point of view it would be advantageous to circumvent the un- 
natural restrictions (b) and (c) and to extend the barycentric approximation 

1 Naive application of the barycentric approximation scheme to problems which 
fail to satisfy the conditions (b)-(d) may provide useful results, but no error bounds. 

2 This is always possible for linear two-stage stochastic programs. 
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scheme to broader problem classes. However, from a theoretical perspective 
it is highly questionable whether the auxiliary value functions then still pro- 
vide upper and lower bounds on the original recourse functions. In order to 
clarify the role of the auxiliary value functions in a less restrictive setting, 
we will study below the structural properties of convex multistage stochastic 
programs with a general nonconvex dependence on the random variables. In 
particular, Sect. 5.1 is devoted to the study of profit functions which are non- 
convex in t], whereas Sect. 5.2 investigates generalized constraint functions 
which are nononvex in £. We will argue that under weak regularity conditions 
the recourse functions of these generalized stochastic programs still have ar- 
bitrarily tight upper and lower bounds, which are computationally accessible. 
Moreover, the bounds basically coincide with the well-known auxiliary value 
functions shifted by specific random variables. 

Below, we frequently work with the barycentric measures (4.36). Moreover, 
we further investigate the auxiliary recourse functions (4.39) introduced in 
Chap. 4. However, for the sake of transparent notation we will usually suppress 
the refinement parameter J € N. 



5.1 Stochasticity of the Profit Functions 



In this section we consider again a convex multistage stochastic program of 
the form (3.1). As argued in Sect. 3.5, the assumptions (B1)-(B5) ensure that 
the recourse function <P t of such an optimization problem is subdifferentiable 
and has a characteristic saddle structure on a neighborhood of t € r. In 
the sequel, we shall modify the technical condition (B2) in order to allow for 
profit functions with a nonconvex ^-dependence. To this end, we need the 
following definition. 

Definition 5.1. For any t £ t the profit function p t is called regularizable if 
there exists a continuous mapping at ■ R^ — > R with the following properties: 
a t is convex in rf on a convex neighborhood of O t , and p t + a t is a saddle 
function on a convex neighborhood ofY 1 being concave in x t , convex in rf, 
and constant in £*. 

Notice that if pt is regularizable, then it is necessarily concave in x l , constant 
in £ l , and continuous on a convex neighborhood of Y 4 . Below we will refer 
to the functions {at}te T as correction terms. Next, we can state the modified 
regularity conditions. 
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(Cl) = (Bl); 

(C2) — the profit function p t is continuous on the underlying Euclidean 
space and concave in x l for fixed values of the stochastic param- 
eters, t Gt; 

— pt is regularizable for every t € r; 

(C3) = (B3); (C4) = (B4); (C5) = (B5). 



These conditions are assumed to hold throughout this section. Note that as- 
sumption (C2) includes assumption (B2) as a special case. In fact, if (B2) 
holds, the trivial correction terms { a t = 0} ter suffice to regularize the profit 
functions of all decision stages. However, it is important to realize that re- 
gularizable profit functions can have a generalized nonconvex ^-dependence. 
Below, proposition 5.3 identifies a large class of regularizable profit functions 
and demonstrates the construction of suitable correction terms. 








Fig. 5.1. Regularization of a biconcave profit function 
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Suppressing the refinement parameter J £ N for better readability, it is 
useful to introduce the auxiliary value functions {^}t £r and as in 

(4.39) and the discrete barycentric measures P l and P u as in (4.36). The 
following proposition 5.2 ensures that the auxiliary value functions and the 
recourse functions of the original problem are well-defined. 

Proposition 5.2. Consider a multistage stochastic program satisfying the 
regularity conditions (C1)-(C5). Then the following hold: 

(a) the transition probabilities P\ and P“ are well-defined for t £ r and define 
two discrete probability measures P l and P u on ( 0 x S,B(0 x S)); 

(b) the original stochastic program complies with the conditions (A1)-(A5), 
and the associated auxiliary programs satisfy (A1)”-(A5)” ; 

(c) the value functions <T>t, <P l t , and are concave in the decision vector a: 4-1 
for all t £ t_o- 

Proof. Assertion (a) follows from (Cl) and (C4). Moreover, the propositions 
3.6 through 3.9 remain valid under the new regularity conditions (C1)-(C5) 
without modification of the corresponding proofs. These results can be used 
to show that the regularity conditions (Cl)-(C5) entail (A1)-(A5). As the 
barycentric measures are discrete, we may then conclude that the auxiliary 
stochastic programs satisfy (A1)”-(A5)”. Thus, (b) follows. Finally, state- 
ment (c) is proved precisely as in proposition 4.5. □ 

Part (b) of proposition 5.2 implies that the static and dynamic versions of 
the stochastic program under consideration are both solvable and equivalent. 
The same is true for the associated auxiliary stochastic programs. Moreover, 
the generalized feasible sets Y t and Z 4 are convex and compact for each ter. 
Compactness is needed for the proof of the following proposition 5.3, which 
basically states that smooth profit functions are always regularizable. 

Proposition 5.3. For some t £ r assume that p t is a smooth function on a 
convex neighborhood of the compact set Y l being twice continuously differen- 
tiable in rf , concave in x l , and constant in . Then, p t is regularizable. 

Proof. By compactness of Y l there is a compact convex neighborhood U of 
y 4 and an open convex neighborhood V of U such that the restriction of pt 
to V is twice continuously differentiable in rj 4 , i.e. the second order partial 
derivatives of pt with respect to the stochastic variables are continuous on V . 
Consider the assignment which maps any vector in V to the Hessian submatrix 
corresponding to the components of rf. 

^ V„. ® V„t p t (*W) e R A '* xAr * 
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By assumption, this matrix- valued function is continuous on V . Compactness 
of U C V and continuity of the matrix 2-norm (denoted by || ■ H 2 ) imply that 



sup ■ 



1 V,. p t ( x 1 ,^ 



(xW,t)&U)} 



is finite and nonnegative. Thus, the regularized matrix W v t 0 V v t p t + cj Ik * 
is positive semidefinite on U, where \k< denotes the .^‘-dimensional identity 
matrix. Moreover, the function pt(x‘,r/‘) + cf [rf, rf) is convex in 77 * on U. 
Based upon these notions we can define 



at(» f) ■=cUv t ,r l t ). 



By construction, p t + a t is a continuous saddle function on a convex neigh- 
borhood of Y*. Thus, p t is regularizable. □ 

In the remainder of this section we consider a stochastic program of the 
form (3.1) subject to the regularity conditions (C1)-(C5). For every t £ r, at 
denotes a suitable correction term as in definition 5.1, which is assumed to be 
given. It should be emphasized again that these correction terms are required 
to be continuous and constant in the decision variables. In order to simplify 
notation, let us introduce three sequences of additional random variables. For 
every ter define 



11 

£ 


_S = t 
- rp 


ST 

if 

Ja 


_s=t 

- rp 


:= E? 


J2^ s (rj s ) 

_s~t 



Thus, A t , A t , and A™ are given by the conditional expectation values of the 
future correction terms with respect to the probability measures P, P l , and 
P u , respectively. In addition, we will need the unconditional expectation of 
the sum of all correction terms with respect to the three probability measures 
under consideration: 

(EA 0 ) :=E [4>M, 

(E l A l 0 ) :=E l [A l 0 (rj 0 )], (5.2) 

{E U A \ 0 :=E-[A^( Vo )]. 

The random variables (5.1) as well as the constants (5.2) will be referred 
to as conditional correction terms below. It is important to notice that the 
conditional correction terms are computationally accessible. Evaluation of A\ 
and Af requires calculation of a finite sum, while A t can be evaluated by 
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means of numerical integration techniques. Since a t is continuous for every 
t £ t, A t , A\, and A “ are continuous and bounded on 0 4 . Moreover, by 
convexity of the correction terms we find A[ < A t < A “ on 0 4 , although A l t 
and A^ are not necessarily convex (use a similar argument as in the proof of 
theorem 4.7). With the above conventions we are now prepared to define an 
adjusted recourse problem 



sup 




' T 

^2p t (x t ,r} t ) + a t {ri t ) 

.t = 0 



dP(v,S) 



s.t. f t (x t ,^ t )<0 P- a.s. t £ t, 



(5.3) 



where the stage t profit function is given by pt A at- Let Pt be the opti- 
mal value function of stage t associated with the stochastic program (5.3). 
As the correction terms {a t }ter are independent of the decision variables, 
they have no influence on the optimal policy and only cause a shift in the 
recourse functions and the optimal objective function value. Moreover, de- 
note by Pf and Pf the auxiliary value functions of stage t corresponding to 
the recourse problem (5.3). The following proposition 5.4 summarizes some 
important properties of the adjusted recourse problem and the associated 
auxiliary value functions. 



Proposition 5.4. Consider a multistage stochastic program subject to the 
conditions (Cl)-(C5), and define the adjusted recourse problem as in (5.3). 
Then, we may conclude: 

(a) P\ < P t < Pf on Z l 'it £ t; 

(b) (E l P l 0 ) < (EP 0 ) < (E u Ptf); 

(c) p t =$ t + A t and Pf = Pf + Af id £ {l, u}, t £ r; 

(d) (EP 0 ) = (EP 0 ) + (EAq) and (E d P$) - (E d <P d ) + ( E d A d ) id £ {l,u}. 

Proof. By construction, the profit functions {pt + ct t }te T satisfy the regular- 
ity condition (B2). Consequently, the adjusted recourse problem (5.3) satis- 
fies each of the conditions (B1)-(B5). This observation implies via the theo- 
rems 3.14 and 3.18 that T t is subdifferentiable and exhibits a saddle structure 
on a convex neighborhood of Z f ( t £ t). In particular, the assumptions of 
theorem 4.7 are fulfilled, and the barycentric approximation scheme applies 
yielding upper and lower bounds on the recourse function !^ t . Hence the asser- 
tions (a) and (b) follow. The statements (c) and (d) are straightforward and 
follow directly from the definition of the adjusted recourse problem and the 
fact that the correction terms are independent of the decision variables. □ 



Theorem 5.5 (Adjusted Bounds I). Consider a stochastic program satis- 
fying the regularity conditions (Cl)-(C5). Then, the correction terms {a t }te T 
can be used to construct (finite) bounds on the recourse function, i.e. we find 
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<£>[ + A l t - A t <$ t <@t + A? - A t on Z 1 for all ter. 
Analogously, the optimal objective function value has bounds: 

(E%) + (E l A l 0 ) - (EA 0 ) < (E$ o) < (£“<?o) + (£“^o) - (EA 0 ). 

Proof. The claim follows directly from proposition 5.4. □ 

Theorem 5.6 (Convergence). Consider a stochastic program satisfying the 
regularity conditions ( Cl )-( C5). If the barycentric probability measures are 
regularly refined in the sense of (f.37), then the random variables A[ and Af 
converge to A t uniformly on 6>*, while the auxiliary recourse functions <J>\ and 
<?“ converge to <l>t uniformly on Z l for all t £ t. 

Proof. By assumption, the correction term at is uniformly continuous on Q l 
for every t £ t. Thus, convergence of A\ and A'f to A t uniformly on <9 4 (t £ r) 
follows from the weak convergence of the discrete barycentric measures to the 
original probability measure. Moreover, theorem 4.8 applied to the adjusted 
recourse problem (5.3) ensures convergence of 3^ and Ff to F t uniformly on 
Z t for all t £ t. Combining the above results, by proposition 5.4 (c) we have 
convergence of $ l t and to <F t uniformly on Z l for all f 6 r. □ 



5.2 Stochasticity of the Constraint Functions 

This section is devoted to the study of generalized constraint functions. Con- 
cretely speaking, we aim at modifying assumption (B3) in order to allow for 
constraint functions with a generalized nonconvex ^-dependence. For the fur- 
ther argumentation we need the following definition. 

Definition 5.7. For any t £ r the constraint function f t is called regular- 
izable if there is a continuous vector-valued mapping K t : R L — > R 1 "* with 
the following properties: Kt is convex in £* on a convex neighborhood of E t , 
while f t + K t is jointly convex in ( x t ,£ t ) and constant in rf on a convex 
neighborhood ofY *. 

It is convenient to write the mapping K t as 

' k? 

< At® q+ : R l ‘ ->M r * q , 

K t q ~ : M l ‘ -> M r *’ q . 

This representation reflects the grouping of f t = (/'/', /® q , — /® q ) by inequal- 
ity and equality constraints. Notice that any regularizable constraint function 






K t = (k ' t n ,K e t q+ ,K 



eq+ ^eq-\ 



where 
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f t is necessarily convex in a; 4 , constant in rf, and locally Lipschitz 3 on a con- 
vex neighborhood of Y t . Local Lipschitz continuity follows from the fact that 
ft can be written as the difference of the functions ft + K t and K t , which 
are both convex and locally Lipschitz on a convex neighborhood of Y t . Be- 
low, we will prove compactness of Y l , which allows us to conclude that f t is 
even (globally) Lipschitz on some compact convex neighborhood of Y* . But 
first, an important result about linear constraints is provided in the following 
proposition. 

Proposition 5.8. Consider a constraint function ft which is linear affine in 
the decision variables, i.e. 

/*(»*,€*) = W t (e)x t + T t (t t )x t ~ 1 - /**(€*), ( 5 . 4 ) 

where Wt, T t , and h t are matrix- and vector-valued mappings on M Lt with 
appropriate dimensions. If f t is regularizable, then Wt and T t are constant 
and h t is representable as a difference of two convex functions on a convex 
neighborhood of E * . 

Proof. Since the constraint function f t is regularizable, there is a vector- valued 
mapping as in definition 5.7. Recall that such a mapping is continuous 
on the entire space and convex on a convex neighborhood of E t . Applying 
proposition D.l in appendix D to each component of f t + K t separately and 
for every reference point in Y l , we may conclude that the matrices W t and 
Tt are locally constant while the difference function Kt — h t is locally convex 
on a convex neighborhood of E* . By connectedness of convex sets, W t and 
T t are constant on a convex neighborhood of E l . Convexity of f t + then 
entails convexity of K t — ht on a convex neighborhood of E t . Consequently, 
the mapping 



h t = K t - (K t - h t ) 

is representable as a difference of two convex functions on a convex neighbor- 
hood of 5*. This notion completes the proof. □ 

With obvious modifications, proposition 5.8 applies to the linear affine 
components of any regularizable constraint function. For example, consider 
a regularizable constraint function f t = (/“, /® q , — /® q ) which accounts for 
both equality and inequality constraints. Thus, both /® q as well as — /| q 
are convex in x l for fixed values of the stochastic parameters. This implies 
that /® q is of the form (5.4), and the assertions of proposition 5.8 hold true. 
On the other hand, we know that a constraint function f t which is linear 
affine in the decision variables can always be brought to the form (5.4). Then, 

3 Recall that if the single constraint function f t ,i has Lipschitz constant A t ,i for 
i = 1, . . . ,rt, then f t has Lipschitz constant A t := (W.T-i A 2 f) 1 / 2 . 
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proposition 5.8 implies that f t is not regularizable if the coefficient matrices 
Wt or T t depend nontrivially on the random parameters, i.e. if either W t or T t 
represents a nonconstant function of £*. This observation will be important 
for the study of linear stochastic programs in Sect. 5.4. Furthermore, in the 
remainder of this section, proposition 5.8 will be used to prove compactness of 
the generalized feasible sets {Y t } tGT corresponding to a system of regularizable 
constraint functions. 

Let us now study the properties of a multistage stochastic program of the 
form (3.1) subject to the following regularity conditions: 



(Dl) = (Bl); (D2) = (B2); 

(D3) — the constraint function f t is continuous on the underlying Eu- 
clidean space and globally convex in x l for fixed values of the 
stochastic parameters, t € r; 

— f t is regularizable for every t € r; 

— the feasible set mapping X t is bounded on a neighborhood of Z 1 ', 
tGr; 

(D4) = (B4); (D5) = (B5). 



These conditions are assumed to hold throughout this section. One should 
pay attention to a subtlety in the assumptions (D2) and (D3). In fact, these 
regularity conditions are required to hold on a convex neighborhood of Y t , 
although Y* itself is not necessarily convex. Thus, (D2) and (D3) must oc- 
casionally hold far beyond the natural domain of the profit and constraint 
functions (and not just on some e-neighborhood of Y l ). 

Note that condition (D3) includes condition (B3) as a special case. In 
fact, if (B3) holds, the trivial mappings {n t = 0} teT suffice to regularize the 
constraint functions of all decision stages. Moreover, in proposition 5.11 we 
will present a broad class of regularizable constraint functions which do not 
satisfy assumption (B3). 

Proposition 5.9. Consider a stochastic program subject to the regularity con- 
ditions (D1)-(D5). Then, for alltGr we have: 

(a) the feasible set mapping X t is non-empty-compact-valued and use on a 
neighborhood of Z 1 ; 

(b) for any neighborhood U ofY 1 there is a neighborhood V of such that 

graph c U. 

Proof. By proposition 5.8, the restriction of /® q to some neighborhood of 
Y l is representable as a sum of a linear function independent of £f and a 
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Lipschitzian function independent of x l for all t £ r. This result guarantees 
that the propositions 3.7 through 3.9 remain valid under the new regularity 
conditions (D1)-(D5) without modification of the corresponding proofs. □ 

Suppressing the refinement parameter J £ N to simplify notation, we 
introduce the auxiliary value functions {<^} teT and {$“}t eT as in (4.39) and 
the discrete measures P l and P u as in (4.36). The following proposition 5.10 
ensures that the original as well as the auxiliary value functions are well- 
defined. 

Proposition 5.10. Consider a multistage stochastic program satisfying the 
regularity conditions (D1)-(D5). Then the following hold: 

(a) the transition probabilities P} and Pf are well-defined for t G r and define 
two discrete probability measures P l and P u on (<9 x S,B(0 x .S’)); 

(b) the original stochastic program complies with the conditions (Al)-(A5), 
and the associated auxiliary programs satisfy (Al)”-(A5)”; 

(c) the value functions <£ t; P[, and <P“ are concave in the decision vector ao t_1 
for all t € t_ o- 

Proof. Assertion (a) follows from (Dl) and (D4). Arguing as in Sect. 3.3, 
one may use proposition 5.9 to show that the regularity conditions (Dl)- 
(D5) entail the more fundamental conditions (A1)-(A5). As the barycentric 
measures are discrete, we may then conclude that the auxiliary stochastic 
programs satisfy (A1)”-(A5)”. Thus, (b) follows. Finally, statement (c) is 
proved precisely as in proposition 4.5. □ 

Statement (b) of proposition 5.10 implies that the static and dynamic 
versions of the stochastic program under consideration are both solvable and 
equivalent. The same is true for the associated auxiliary stochastic programs. 
Moreover, the generalized feasible sets Y l and Z l are compact for each t £ r. 
Compactness of the sets {T f } t€T helps us to identify an important class of 
regularizable constraint functions, as pointed out in the following proposition. 

Proposition 5.11. For some ter assume that Y l is compact, and the con- 
straint function f t is additively separable, i. e. f t — fl + f ■ Furthermore, we 
postulate 

(i) f) is a continuous convex mapping on a convex neighborhood ofY t being 
jointly convex in ( x t ,£ t ) and constant in rf; 

(H) ft is a smooth mapping on a convex neighborhood of Y* being twice con- 
tinuously differentiable in and constant in ( x t ,rf); 



Then, f t is regularizable. 
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Proof. By compactness of Y 1 and assumption (ii) there is a compact convex 
neighborhood U of Y* and an open convex neighborhood V of U such that 
the component f j is twice continuously differentiable on V . Consider the 
assignment which maps any vector in V to the Hessian submatrix of 
corresponding to the stochastic variables (i = 1 , . . . .r t ). 

(xW,?) - V r ®V*. /&(€*) e R Lt * Lt 

By assumption, this matrix-valued function is continuous on V. Compactness 
ofUcV and continuity of the matrix 2 -norm imply that 

cl := sup {|| V € .® V c ./&(€ t )|| 2 

is finite and nonnegative. Thus, the regularized function fKC) + c ti (£*>£*) 
is convex in on U. Based upon these notions we can define 

) := (£ i £ ) Vi = 1, ... , Vt, Kt - = (^t,l > • • • ) ^t,r t ) ■ 

By construction, f t + K t is a continuous convex function on a convex neigh- 
borhood of Y* being jointly convex in (a; 4 , £f ) and constant in rf. This implies 
that f t is regularizable. □ 

In the proof of proposition 5.11 we make use of separability to show that 
the constraint function f t is regularizable. Conversely, one can also show f t 
to be regularizable if it is twice continuously differentiable in (* 4 ,^ 4 ), constant 
in 77 * , and strictly convex in x l on a convex neighborhood of Y l . Thus, loosely 
speaking, additive separability or strict convexity in the decision variables 
are both sufficient to imply regularizablility. Unfortunately, strict convexity is 
rarely seen in realistic problems, which are predominantly governed by linear 
constraints. 

In the remainder, we consider a stochastic program of the form (3.1) sub- 
ject to the new regularity conditions (D1)-(D5). For every t £ t, K t denotes 
a suitable mapping as in definition 5.7, which is assumed to be given. 

Now, consider the dynamic version of the original stochastic program. The 
parametric stage t subproblem reads 

sup q t (x W,£‘) 

a; t eR"‘ 

s.t. f?(x\e) < 0 (5.5) 

/rv,€*)=o, 

where q t ■= pt + (E t $ t+ 1 ) for t € t_t, and qr ■= pr- Notice that this repre- 
sentation explicitly distinguishes inequality and equality constraints. Let 

dr =dr(x t -\ v t ,e) and dr^rv-w,^) 
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be any pair of history-dependent Lagrange multipliers associated with the in- 
equality and equality constraints in (5.5), respectively. Notice that (c£) n *, d^*) 
always exists, although not being unique in general; see appendix B for an 
introduction to Lagrangian duality. In order to allow for a uniform treatment 
of inequality and equality constraints, we assign to every pair (d) n *,d^ q *) an 
r t -dimensional vector 

< := (drj< q T,K q T), 

where 

[<f ]* := max{±d*, 0} Vd* e IT, 

and ‘max’ denotes the componentwise maximization operator. By construc- 
tion, dj lies in the nonnegative orthant. of M ri and can formally be inter- 
preted as a Lagrange multiplier corresponding to the unified constraint func- 
tion f t — (/' t n , — fl q ). Next, we introduce a vector D 1 € E r * which is 

an upper bound for the multipliers rf, £*) uniformly over all out- 

come and decision histories in some neighborhood of Z l . At present, we do 
anticipate that such a vector can be found under the assumptions (D1)-(D5). 
A formal proof of its existence is postponed to proposition 5.12. For the fur- 
ther argumentation, it is convenient to introduce a non-anticipative stochastic 
process {a t }t €r , where the random variables 

«,(£*) ( 5 . 6 ) 

will be denoted as correction terms. As in the previous section, we try to argue 
that the adjusted recourse problem with profit functions {pt + ctt}ter has 
nice structural properties in view of the barycentric approximation scheme. 
Again, the correction terms {at} t€T are continuous and independent of the 
decision variables. In order to simplify notation, we introduce three sequences 
of additional random variables. For t € r we define 

" T 

M?) := E t 5> s (0 

_S=t 

T 

5>.(r) 

.s'— £ 

- T 

A2(e):=E? 5> s (0 

_S = t 

Thus, A t , A\, and A“ are given by the conditional expectation of the future 
correction terms with respect to the probability measures P, P l , and P u , 
respectively. Finally, we set 

(EA 0 ) :=E [Ao(C 0 )], 

(E l A l 0 ) :=E l [4, (£„)], 

(E U A%) (€ 0 )]. 



4 (£*) == E\ 




(5.8) 
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The random variables (5.7) as well as the constants (5.8) will be referred 
to as conditional correction terms below. It is important to notice that the 
conditional correction terms are computationally accessible. Evaluation of A\ 
and Af requires calculation of a finite sum while A t can be evaluated by means 
of numerical integration techniques. Since a t is continuous for every t £ t, A t , 
A\, and A) 1 are continuous and bounded on 5*. Moreover, by concavity of the 
correction terms we find A\ < A t < A “ on S l , although A l t and A™ are not 
necessarily concave (use a similar argument as in the proof of theorem 4.7). 
As inspired by the previous section we define the adjusted recourse problem as 



1 



sup 

x€Af n JOxl 



53 Pt(*W) + “*(£*) 



t=0 



dP(r 7,0 



(5.9) 



s.t. f t (x t ,£ t )<0 P- a.s. t 6 t. 



Let 'I't = $t + At be the optimal value function of stage t associated with 
the stochastic program (5.9). As the correction terms are independent of the 
decision variables, they have no influence on the optimal policy and only 
cause a shift in the recourse functions and the optimal objective function 
value. Moreover, denote by d> l t = + A[ and + A) the lower and 

upper auxiliary value functions of stage t, respectively, corresponding to the 
adjusted recourse problem (5.9). 

In the remainder of this section we will argue that the optimal value func- 
tions of the adjusted recourse problem (5.9) are subdifferentiable and exhibit 
a specific saddle structure. This requires more sophisticated arguments than 
in Sect. 5.1 as the adjusted recourse problem (5.9) generally fails to comply 
with the regularity conditions (B1)-(B5). 



Proposition 5.12. Consider a stochastic program, which satisfies the regu- 
larity conditions (D1)-(D5), and define the adjusted recourse problem as in 
(5.9) with recourse functions {^ ( } tgT . Then, we find for all t £ r: 

(a) there exists an upper bounding vector D( as in the definition of the cor- 
rection terms (5.6); 

(b) is a subdifferentiable saddle function on a neighborhood of Z l being 
convex in rf and jointly concave in and the decision variables. 



Assertion (b) states that for each t £ r the adjusted value function \T t is a 
convex-concave saddle function on some neighborhood of its natural domain 
Z 1 , which is generally nonconvex. Recall that the notion of convexity (concav- 
ity) for functions on nonconvex domains has been introduced in Sect. 3.2.2. 

Proof of proposition 5.12. For each ter we will construct an extended-real- 
valued auxiliary function >T t which coincides with the recourse function T/ 
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on a neighborhood of Z 4 . The mapping $ % is a saddle function being convex 
in rj* and jointly concave in £ 4 and the decision variables on the entire un- 
derlying space. Moreover it is subdifferentiable and continuous on a convex 
neighborhood of Z 4 . Asa byproduct, we can recursively show the existence of 
a bounding vector D* as in the definition of the correction terms (5.6). 

The claim is shown by backward induction. By the assumptions (D2) and 
(D3) there are open (not necessarily convex) sets 

Y? C M" T x R lT and Fj C R rT 

such that 

(i) Y pf 1 x yj 1 is a neighborhood of F T ; 

(ii) pT is a continuous saddle function on co F^f x coYj’ being concave in 
x T , convex in r; T , and constant in £ 7 '; 

(iii) both fx + kt and n? are continuous convex functions on co Y^ x co Fj\ 

The adjusted recourse function is given by the optimal value of the maxi- 
mization problem (5.10) below. 

sup pt{x t ,t] T ) + a T (£ T ) (5.10) 

x T 6K n T 

s.t. f t(x t , £ T ) < 0 

Notice that the parametric optimization problem (5.10) has the same structure 
as problem (E.l) in appendix E. Thus, proposition E.5 is applicable implying 
that there exists a bounding vector D j. G R rT for the Lagrange multipliers 
associated with the explicit constraints in (5.10) uniformly on some neighbor- 
hood of Z T . In particular, by the propositions 5.9 and E.5, there are open 
(not necessarily convex) sets 

Z^ C M" T_1 x R lT and Zj C R kT 



such that 

(iv) zl x Zq is a neighborhood of Z T ; 

(v) the graph of Xt over Zp x Z J is a subset of Y^ x Y, 'J ; 

(vi) the multifunction X T is non-empty-compact- valued on Zp x Zj; 

(vii) there exists a bounding vector D G R rT for the Lagrange multipliers 
associated with the constraints in (5.12) uniformly over Zp x Zj. 
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Assertion (vii) proves claim (a) for stage T. 4 The existence of a uniform 
bounding vector for the Lagrange multipliers allows us to invoke the results 
of appendix C to derive a penalty-based formulation of problem (5.10). By 
corollary C.2 and the definition of the correction terms (5.6) we may conclude 
that the optimal value of 

sup p T (x T , r) T ) - { D* t , [f T (x T , £ t )]+ + « t (£ T )) (5.11) 

x T 6K n T 

is given by Pt on Z^ x Zj. Next, let us define an auxiliary objective function 

( Pt ~ { D * t , [fr} + + k t ) on co Ypf xco Yj , 

Pt := < +oo on co x (coF[J) c , 

[ — oo everywhere else. 

As will be shown below, Pt is a continuous saddle function on the open convex 
set co Y? x co Y,J . In fact, the profit function pr is a continuous saddle function 
as implied by assertion (ii). In addition, the penalty term 

[/t] + + kt) = -(By,max{/ T + k t ,k t }) 

is continuous, concave, and independent of r\ T by assertion (iii). Recall that 
the operator ‘max’ applied to a set of vectors stands for componentwise 
maximization. Thus, the auxiliary objective function pT is continuous on 
coy n T x co Yj. Moreover, by construction, pr is an extended-real- valued sad- 
dle function on the entire underlying space. For the further argumentation we 
need the sup-projection defined through 

Mx T -\n T ,e)--= sup 
* T eH" r 

The results on sup-projections in Sect. 3.4 guarantee that is a saddle 
function on the entire space (cf. also the related argument in the proof of 
theorem 3.14). Furthermore, 1 Pt is pointwise finite on the open set x Z[j 
due to the statements (v) and (vi). The saddle property then implies pointwise 
finiteness on the convex hull of Z^ x Zj. Pointwise finiteness of a saddle 
function on an open set is equivalent to continuity and subdifferentiability 
as implied by [89, theorem 10.1 and theorem 23.4]. Finally, we may conclude 
that Pt = Pt on Z^ x Zj due to assertion (v) and the construction of 
Pt- Consequently, claim (b) is established for stage T, and the basis step is 
complete. 

Assume now that the assertions (a) and (b) have been proved for stage 
t + 1. In addition, assume that there exists a function Pt+i with the postu- 
lated properties. Hence, by the assumptions (D2) and (D3) and the induction 
hypothesis there are open (not necessarily convex) sets 

4 Note that the correction terms have no influence on the Lagrange multipliers 
since they are independent of the decision variables. 
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Y* C E n ‘ x E L ‘ and Y* C E x ' 



such that 

(i) * Y* x Y^ is a neighborhood of Y 4 ; 

(ii) * both p t and (E t Pt+ 1 ) are continuous saddle functions on coY^ x co Yy 

being concave in x 4 , convex in rf , and constant in 

(iii) * both / t + Kt and Kt are continuous convex functions on co Yp x co Yy; 

(iv) * (Et& t+ 1) = (EpPt+i) on Yp x Yy. 

Due to assertion (iv)* the optimal value of the maximization problem (5.12) 
below coincides with the adjusted recourse function Pt at least on Z t . 

sup p t {x\rf) + a t (?) + (Et&t+iXx*, (5.12) 

s.t. /t(x 4 ,^)< 0 

As in the basis step, proposition E.5 is applicable implying that there exists 
a bounding vector D* e E r ‘ for the Lagrange multipliers associated with 
the explicit constraints in (5.12) uniformly over a neighborhood of Z 4 . In 
particular, by the propositions 5.9 and E.5 there are open (not necessarily 
convex) sets 



Z l n cr‘ 1 x E L ‘ and Z l u C E*‘ 



such that 

(v) * Zp x Zy is a neighborhood of Z 4 ; 

(vi) * the graph of X t over Zp x Zy is a subset of Yp x Y'yl 

(vii) * the multifunction X t is non-empty-compact- valued on Zp x Zy; 

(viii)* there exists a bounding vector D* € E r< for the Lagrange multipliers 
associated with the constraints in (5.12) uniformly over Zp X Zy. 

The assertions (iv)* and (viii)* prove claim (a) for stage t. 5 Moreover, the 
assertions (iv)* and (vi)* imply that the optimal value of (5.12) is given by 
Pt for all parameters in Zp x Zy. The existence of a uniform bounding vector 
for the Lagrange multipliers allows us to derive a penalty-based formulation 
of problem (5.12). By corollary C.2 and the definition of the correction terms 
(5.6), we may conclude that the optimal value of 



5 Use proposition E.6 in the appendix, and observe that the correction terms have 
no influence on the Lagrange multipliers. 
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sup p t (x 1 ,r 7 t ) + {E t $ t+1 ){x\rf ,£) - {£>*, [/*(**, £*)] + + ««(€*))( 5.13) 

icteR"! 

is given by iPv on xZj. Next, let us define an auxiliary objective function 

f Pt + (E t Vt+i) - (D*t,lftl + + Kt) on coY^ x coFy, 

p t := < +oo on co x (coY(j) c , 

[ —oo everywhere else. 

By the assertions (ii)* and (iii)* and by the induction hypothesis, p t represents 
an extended-real-valued saddle function on the entire underlying space being 
continuous on the open convex set co Yf x co YJj . For the further argumentation 
we need the sup-projection 

&t( : = sup p t (x t ,rj t ,£ t ). 

By the same reasoning as in the basis step, one shows to be a saddle 

function on the entire space which is continuous on the convex hull of Zf x Zf. 

Finally, we may conclude that on Zf x Z(j due to assertion (vi)* and 

the construction of T t . Consequently, claim (b) is established for stage t, and 
the induction step is complete. □ 

In the proof of proposition 5.12 we show existence of the upper bounding 
vectors {D1} t er by means of a backward recursion scheme. In practical ap- 
plications, the bounding vectors of the Lagrange multipliers may be hard to 
find. Suitable estimates frequently follow from a good understanding of the 
underlying models (see also Chap. 6). 

Proposition 5.13. Consider a multistage stochastic program subject to the 
conditions (D1)-(D5), and define the adjusted recourse problem as in (5.9). 
Then, we may conclude: 

(a) T'l < T t < Tf on Z t Vf € r; 

(b) (E%) < (E%) < (E U *Z); 

(c) \T t = T> t + A t and Tf = $ d + A d W G {/, u], t G r; 

(d) (E%) = {E$ 0 ) + (EA 0 ) and (E d V$) = {E d $$) + {E d A$) Md G {l,u}. 

Proof. By proposition 5.12 the recourse function T t of the adjusted recourse 
problem (5.9) is a subdifferentiable saddle function on a neighborhood of its 
natural domain Z* for each t G r. Hence, theorem 4.7 applies to the adjusted 
recourse problem (5.9) without modification of the proof. This implies that 
Tt is squeezed between the auxiliary value functions and T)) for all ter, 
and thus the assertions (a) and (b) are established. The statements (c) and 
(d) are straightforward and follow directly from the definition of the adjusted 
recourse problem and the fact that the correction terms are independent of 
the decision variables. □ 
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Theorem 5.14 (Adjusted Bounds II). Consider a stochastic program 
satisfying the regularity conditions (D1)-(D5). Then, the correction terms 
{a 4 } ter specified in (5.6) can he used to construct (finite) bounds on the re- 
course function: 

<P l t + A\ -A t <$ t < + Af- A t on Z 4 for all t e r. 

Analogously, the optimal objective function value has bounds: 

{E%) + ( E l A l 0 ) - (EA 0 ) < (£<P 0 ) < (E u $%) + (E u Aq) - ( EA 0 ). 

Proof. The claim follows directly from proposition 5.13. □ 

Theorem 5.15 (Convergence). Consider a stochastic program satisfying 
the regularity conditions (D1)-(D5). If the barycentric probability measures 
are regularly refined in the sense of (4.37), then the random variables A\ and 
Af converge to At uniformly on S l while the auxiliary recourse functions T>[ 
and TAf converge to <&t uniformly on Z 1 for all t € r. 

Proof. By assumption, the correction term a t is uniformly continuous on 
for every ter. Thus, convergence of A\ and Af to A t uniformly on S' 4 (f e r) 
follows from the weak convergence of the discrete barycentric measures to the 
original probability measure. By proposition 5.12 the recourse function of 
the adjusted recourse problem (5.9) is a continuous saddle function on its 
natural domain Z l for ter. Thus, one can easily verify that theorem 4.8 
applies to the adjusted recourse problem (5.9) without modification of the 
proof. This observation implies convergence of Tj and to $ t uniformly on 
Z t for all ter. Combining the above results, by proposition 5.13 (c) we have 
convergence of and <£“ to <P t uniformly on Z 4 for all t € r. □ 



5.3 Synthesis of Results 

In this section we combine the previous results to extend the scope of the 
barycentric approximation scheme to an even broader class of convex opti- 
mization problems. Concretely speaking, we study stochastic programs of the 
form (3.1) with regularizable profit functions in the sense of definition 5.1 and 
regularizable constraint functions in the sense of definition 5.7. Such problems 
are supposed to satisfy the following regularity conditions: 



(El) = (Bl); (E2) = (C2); (E3) = (D3); (E4) = (B4); (E5) = (B5). 



Here, neither the profit nor the constraint functions are assumed to be con- 
vex in the stochastic parameters. For every t 6 r, a° denotes a suitable 
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correction term as in definition 5.1, which is assumed to be given. Thus, by 
condition (E2), the regularized profit functions {p t + a°} t6r satisfy the more 
restrictive condition (B2). Consequently, the stochastic program 



sup 

£CGA/V 



/ 

J&V.S 



Y^pt{x\'n t ) + a 0 t{'n t ) 



,t = 0 



dP(r 7,0 



(5.14) 



s.t. / t (ar,£ ) < 0 P-a.s. t £ t 



is subject to the regularity conditions (D1)-(D5). This observation allows us 
to apply the specific results of Sect. 5.2. Above all, we may conclude that 
there are correction terms ct\ as in (5.6); notice that the involved bounding 
vectors for the Lagrange multipliers exist due to proposition 5.12 (a). Then, 
for all t G r define a combined correction term as 



: = <*?(*7*) + <*£(£*)• (5-15) 

Obviously, a t is a continuous saddle function on a neighborhood of O' x E L 
being convex in rf and concave in £ 4 . As usual, let us introduce three sequences 
of additional random variables. For ter define 



Mv*,?) ■ =Et 
a\{ v i ,e) ■ =E[ 



E o 

s=t 

J2®s(ri s ,t s ) 

8—t 

J2 as ^ s ^ s ) 



s~t 



(5.16) 



and for the sake of transparent notation introduce three constants 

(EA 0 ) :=E [A 0 (i7o,€o)], 

(E l A l 0 ) :=E l [A l 0 (r , 0 ,€ 0 )]. (5-17) 

(E U A%) :=P“[A^(r7 0 ,^)]. 

The random variables (5.16) as well as the constants (5.17) will be referred 
to as conditional correction terms below. It is important to notice that the 
conditional correction terms are computationally accessible. Evaluation of A\ 
and A'l requires calculation of a finite sum, while A t can be evaluated by 
means of numerical integration techniques. Since a t is continuous for every 
ter, At, A l t , and A“ are continuous and bounded on 0 4 x E} . Moreover, 
by the saddle structure of the correction terms, we find A[ < A t < A“ on 
O l x E 1 , although A\ and A“ do not necessarily exhibit a saddle shape (use a 
similar argument as in the proof of theorem 4.7). Following (5.9), we associate 
to (5.14) an adjusted recourse problem with profit functions 
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p t + a° + a\ = p t + a t . 

Since the correction terms are independent of the decision variables, it is ob- 
vious that for each t G r the optimal value function of the adjusted recourse 
problem reduces to <P t + A t (cf. also the assertions (c) and (d) of proposi- 
tion 5.13). Furthermore, arguing as in proposition 5.13 (a) and (b), one can 
show that theorem 4.7 applies to the adjusted recourse problem associated 
with (5.14). Thus, the auxiliary value functions T>\ + A l t and <£“ + A“ obtained 
via barycentric discretization provide upper and lower bounds on the original 
value function of the adjusted recourse problem. The above results culminate 
in the following theorem, which is clearly a generalization of the theorems 5.5 
and 5.14. 

Theorem 5.16 (Adjusted Bounds III). Consider a stochastic program 
satisfying the regularity conditions (E1)-(E5), and let {at}t£ T be suitable cor- 
rection terms as in (5.15). Then, we find 

$ l t + A\ - A t < $ t < + A“ - A t on Z 4 for all t G r. 

Analogously, the optimal objective function value has bounds: 

(E%) + ( E l A l 0 ) - (EAo) < (E$ o) < (E u $%) + (E U A%) - ( EA 0 ). 

As in the previous sections, the upper and lower bounds on the recourse 
functions become tighter as the barycentric measures are suitably refined. 
Thus, one can easily prove the following convergence result. 

Theorem 5.17 (Convergence). Consider a stochastic program satisfying 
the regularity conditions (E1)-(E5). If the barycentric probability measures 
are regularly refined in the sense of (4-37), then the random variables A\ and 
Af converge to A t uniformly on O l x E l while the auxiliary recourse functions 
<h l t and converge to uniformly on Z 1 for all t G r. 



5.4 Linear Stochastic Programs 



Linear stochastic programs were first studied by Danzig [17,18]. One of the 
reasons for their wide use is the existence of powerful solution algorithms, 
which exploit specific structural properties. Any linear multistage stochastic 
program can be brought to the form 



s.t. 



sup 







L f=0 



dP(*l,€) 



W t (e)x t + T t (e)x t -I^tht(e) P- a.s. 



t € r. 



(5.18) 
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In contrast to general convex stochastic programs, the profit and constraint 
functions are now linear in the decision variables. However, we do not require 
linearity in the stochastic parameters. When dealing with linear stochastic 
programs, it proves useful to work explicitly with inequality (less-or-equal, 
greater-or-equal) and equality constraints. Thus, for notational convenience 
we introduce in every stage an r£ ff -dimensional ‘vector’ of binary relations, 
each of whose entries is either ‘<’ or *>’ for inequalities, or *=’ for equalities. 
Of course, the stochastic program (5.18) can be brought to the form (3.1) if 
the equality constraints are replaced by two opposing inequality constraints, 
and the ‘greater-or-equal’ constraints are multiplied by —1. Notice that the 
constraints in (5.18) only couple neighboring decision stages. This can always 
be enforced, i.e. dependencies across more than one decision stage can system- 
atically be eliminated by introducing additional decision variables. For every 
t £ r, the r® ff x n t matrix W t is termed recourse matrix and may generally 
depend on the stochastic parameters Similarly, the r® ff x rit~i matrix T t is 
referred to as technology matrix in literature. Obviously, the technology ma- 
trix determines the interperiodical coupling and generally represents a random 
object, as it may depend on Moreover, the right hand side (rhs) vector h t 
and the vector of objective function coefficients cl are random too, since they 
depend on £f and ffi, respectively. 

After a possible reordering, the constraints in (5.18) can be written as 

wt (e)xt+Tf < o, 

wrMxt+irwxt^-hFtf) > o, 

W^Xt + TP^Xt^-h^e) = 0 , 

and for any fixed € R Lt we have 

Wt <ElT‘ exnt , Tt eR r ' < ' XR ‘-‘, h}f El'!', 

Wf e M r ‘ 6xn ‘ , r ( ge e hf € R r ® e , 

w t eq e R r * q xnt , r t eq e R r < q 1 s ^ e R^r . 

The dimensions should match up properly, i.e. r® ff = r\ e + rf e + r j q . In this 
case, the vector of binary relations reads 




With the above conventions it is easy to see that (5.18) is equivalent to a 
stochastic program of the form (3.1) with profit functions 

Pt(*W) : = <cJ(V).*t) (5.19) 

and rt-dimensional constraint functions 
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( wi*{e) Xt +Ti*{e) Xt -. i-h* e (£*) \ 
-wne) x t -Tf (€*) +/if «*) 

*t +T t eq (^) *t-i 

\-wp(e)x t -rp(e) **_! +^ q (4*) / 



(5.20) 



where rt = rj 6 + rf e + 2 r® q for all t € t. The latter optimization problem 
is called the normal form of the linear stochastic program (5.18). As will 
become clear below, the formulation (5.18) accentuates the intrinsic primal- 
dual symmetry properties of linear programs, whereas the normal form is 
needed for comparison with our previous results. In the remainder of this 
section the regularity conditions (Bl), (B4), and (B5) are assumed to hold. 
Now, let us briefly discuss under which conditions on c£, Wt, T t , and h t the 
normal form of the linear stochastic program (5.18) can be expected to comply 
with (B2) and (B3). 



Proposition 5.18. Consider a linear stochastic program which complies with 
the regularity conditions (Bl), (Bf), and (B5). Then, the corresponding nor- 
mal form satisfies condition (B2) if and only if the vector-valued functions c ( 
are continuous, and there is a convex neighborhood of 0* where the following 
implications hold (ter, i = 1, . . . ,n t ): 

(i) Vx t € X t (Z l ) : x t ,i >0 =>• cfi is convex in rf, 

(ii) V® t € X t (Z l ) : x t ,i <0 =t- c( % is concave in rf , (5.21) 

(in) 3xt € Xt(Z 4 ) : a : t ,i =0 => c* t l is linear affine in rf. 

Proof. The proof is merely a careful check of definitions. Recall that X t (Z l ) is 
the image of the feasible set mapping X t over Z l , which equals the projection 
of Y l on R nt . □ 



Proposition 5.19. Consider a linear stochastic program which complies with 
the regularity conditions (Bl), (Bf), and (B5). Then, the corresponding nor- 
mal form satisfies condition (B3) if and only if the vector- and matrix-valued 
mappings h t , Wt, and T t are continuous, and there is a convex neighborhood 
of S l when we have (t 6 r, i = 1, . . . , r® ff ) : 

(i) the i’th entry of is ’>’ => h t ,i is convex in 

(ii) the i’th entry of ~ 4 is ‘<’ '=>■ /i tji is concave in , 

(Hi) the i’th entry of ~ t is —’ => h t}i is linear affine in £*, (5.22) 

(iv) the matrices W t and T t are deterministic, 

(v) the cone {x t \W t x t ~tO} contains only the zero vector. 
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Proof. First, we show sufficiency. Independence of the recourse matrix W t 
and technology matrix T t from the stochastic parameters follows immediately 
from proposition 5.8. The properties (i), (ii), and (iii) of the rhs vectors h t 
are then straightforward. Moreover, property (v) follows from boundedness of 
the feasible set mapping X t on a neighborhood of Z l . 

Let us next prove necessity. Under the given assumptions, the constraint 
function (5.20) is obviously continuous and convex in the decision variables. 
Moreover, it is jointly convex in (x 1 . £*) on a convex neighborhood of Y l . It 
remains to be shown that the associated feasible set mapping X t is bounded on 
a neighborhood of Z l for all ter. As a byproduct, we will prove compactness 
of the generalized feasible sets. 

We prove compactness of Y* and Z 1 by induction with respect to t. The 
basis step is trivial since Z° is compact by assumption (Bl). Next, assume 
Z l to be compact for some t g r. By assumption, the recession cone of the 
feasible set X t (x t ~ l , £ l ) is given by the singleton {0} for every outcome and 
decision history in a neighborhood of Z l . Thus, the feasible set mapping X t 
is pointwise bounded on a neighborhood of Z 1 . Uniform boundedness follows 
from continuity of the rhs vector h t , independence of W t and T t from the 
stochastic parameters, and compactness of Z l . Next, recall that Y t is the 
graph of the bounded multifunction X t over Z t . Therefore, Y* is bounded. In 
addition, continuity of the constraint functions guarantees compactness of Y 1 . 
If t = T, we are finished. Otherwise, compactness of Y l implies compactness 
of Z t+1 = Y* x 0 t+ 1 x Hf +1 . This notion completes the induction step, and 
thus the claim follows. □ 

Below we carry over the results of Sect. 5.3 to the case of linear stochas- 
tic programs with nonconvex objective function coefficients and rhs vectors. 
Thereby, we are naturally led to the study of functions which are representable 
as a difference of two convex functions. 



5.4.1 D.c. Functions 

Let fi be a convex subset of a finite-dimensional Euclidean space, and denote 
by C*0(J7;M r ) the cone of componentwise convex functions from i? into R r . 

Definition 5.20. A function f : i? — » R r is called d.c. (abbreviation for 
difference of convex functions) if there are two functions k + ,k~ £ CO(fi; R r ) 
such that 



f(uj) = k. + (u;) — k ( u > ) Vw G fl. 

Notice that the decomposition of a d.c. function is never unique. In fact, by 
adding the same convex mapping to both k + and k ~ , one easily obtains a 
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new decomposition which fits the above definition. Recently, d.c. functions 
have experienced a great deal of attention in the field of global optimization 
[40,55,80]. A survey of the properties of real-valued d.c. functions can be found 
in [53]. Here, we recall only a few properties of vector-valued d.c. functions, 
which are important in the present context. The class of d.c. functions 17 — » 
R r , denoted by DC(S2;W), is clearly the vector space generated by the cone 
of convex functions from 17 into R r . 

17(7(17; R r ) = <70(17; R r ) - <70(17; R r ) 

Some properties of / £ DC (SI; M r ) are directly inherited from those of convex 
functions. For instance, f is locally Lipschitz on the interior of 17, and the 
derivative of / does exist almost everywhere with respect to the Lebesgue 
measure on 17. D.c. functions / : R — > R r of one real variable have a simple 
internal characterization: / is d.c. if and only if it is locally Lipschitz and its 
derivative f (defined almost everywhere) is of finite variation. In other words, 
d.c. functions on R are precisely indefinite integrals of functions with locally 
bounded variation [53]. The following example of a Lipschitz function which 
is not d.c. originates from Shapiro [100]. 

» f R — > R 

’ 1 w min neN \u> - £| 

Obviously, / is Lipschitz with modulus 1, and the derivative f exists almost 
everywhere. However, /' oscillates between +1 and —1 infinitely often in any 
neighborhood of the origin. Thus, f is not of bounded variation implying that 
/ is not d.c. It is worthwhile to remark that no simple and useful internal 
characterization of d.c. functions of more than just one real variable is known. 

The class of twice continuously differentiable functions from S2 into R r , 
i.e. (7 2 (12;R r ), is a linear subspace of Z?<7(17;M r ). This can easily be proved if 
Q is compact (cf. also the related propositions 5.3 and 5.11, whose proofs rely 
on similar arguments). However, the statement remains true for 17 open or 
unbounded, as pointed out by Hartman [52]. Moreover, let <7 0 (12;R r ) be the 
space of continuous mappings 17 — >■ R r , endowed with the topology of uniform 
convergence with respect to the Euclidean norm in R r . Then, DC(17;R r ) is 
dense in <7°(17;R r ) given that 17 is compact. This follows directly from the 
Stone- Weierstrass theorem and the fact that D(7(17;R r ) contains all vector- 
valued polynomials. 



5.4.2 Generalized Bounds for Linear Stochastic Programs 



In the remainder of this section we study linear multistage stochastic programs 
of the form (5.18) which satisfy the following regularity conditions: 
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(FI) = (Bl); 

(F2) the objective function coefficients c* t are globally continuous and d.c. 

on a convex neighborhood of f 6 r; 

(F3) — the rhs vector ht is globally continuous and d.c. on a convex 
neighborhood of S t , t G r; 

— the matrices Wt and T t are globally continuous and constant on 
a convex neighborhood of t G r; 

— the recession cone {x t \WtXt~t®} is given by {0} on a convex 
neighborhood of S’ 4 , f G r; 

(F4) = (B4); (F5) = (B5). 



As the random vector c£ is d.c., there are two convex mappings k* + and n* t 
on a closed convex neighborhood of O l such that 

C* = K* t + - Kf~ vt G r. (5.23) 

By the Tietze extension theorem [110, p. 103], both k* + and k*~ can be 
extended to globally continuous functions which satisfy (5.23) on all of R K . 
Similarly, since h t is d.c., we may conclude that there are two convex mappings 
and nf on a closed convex neighborhood of S* with 

h t — K.+ - nf it € t. (5-24) 

As in the case of the objective function coefficients, nf and Kf have continu- 
ous extensions which satisfy (5.24) on all of R L . In the next proposition, we 
establish a link to the results of Sect. 5.3. 



Proposition 5.21. The linear stochastic program (5.18) satisfies (F1)-(F5) 
if and only if the corresponding normal form satisfies (E1)-(E5). 



Proof. First, we show necessity. If the linear stochastic program (5.18) satisfies 
the regularity conditions (F1)-(F5), then the rhs vectors can be decomposed as 
in (5.24). Next, rewrite nf as (K;[ e± , fcf e± , K® q± ). This representation reflects 
the classification of constraints by the attributes ‘less-or-equal’, ‘greater-or- 
equal’, and ‘equality’. With this convention, one can define a globally contin- 
uous mapping 



K t 



( 4 6+ 



,K 



ge- ,.eq+ 



which is componentwise convex on a convex neighborhood of Ff . Now, con- 
sider the constraint function f t defined in (5.20). As the involved recourse 
and technology matrices are deterministic by hypothesis, it can easily be ver- 
ified that f t + K t is jointly convex in (x l , £ t ) and constant in rf on a convex 
neighborhood of T 4 . Thus, the constraint function (5.20) is regularizable in 
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the sense of definition 5.7. Next, arguing as in the proof of proposition 5.19, 
one can show that Y l is compact, and the feasible set mapping X t is bounded 
on a neighborhood of Z l for every t Sr. This proves (E3). Compactness of 
Y* implies that there exist nonnegative bounding vectors Xf and X~[ such 
that 



-X- <x t < X+ Va t € XtiZ*). 

Recall that X t (Z t ) is the projection of Y* on M nt and thus compact. By 
hypothesis, the objective function coefficients can be decomposed as in (5.23). 
The involved mappings nl + and K,j~are then used to define suitable correction 
terms a° as in definition 5.1. 

«?(*?*) := + « + (r7‘),X t “) 

Let pt be the profit function (5.19). Then, we find 

Pt(*W) + «?(»?*) = i K r(v t ),X+ -x t ) + (K* t + (r] t ),Xt + Xt). 

Both terms on the right hand side of the above equation are manifestly linear 
affine in x f , convex in t/ 4 , and constant in £ 4 on a convex neighborhood of T 4 . 
Thus, the profit function (5.19) is regularizable in the sense of definition 5.1, 
and condition (E2) holds true. In conclusion, the normal form of the linear 
stochastic program (5.18) satisfies all conditions (E1)-(E5). 

Next, we establish sufficiency. By assumption, the normal form of (5.18) 
is subject to (E1)-(E5). Thus, there is a mapping a° as in definition 5.1 such 
that 



(c?(*? 4 )> **) + <**(»?*) 



(5.25) 



is a saddle function on a convex neighborhood of Y t being linear affine in 
x l , convex in rf, and constant in £ 4 . Next, choose an arbitrary y t £ X^Z*), 
which is kept fixed. Then, there is a real number e > 0 such that (5.25) is 
convex in rf on a convex neighborhood <9 4 for all Xt in the open ball B 2 e {yt)- 
A fortiori, the real-valued mapping 






e l 



(v)-=z (c* t {v ),Vt ) 



■<W) 



is globally continuous and convex on a convex neighborhood of 0 t . For the 
further argumentation we need the standard basis of IR 71 * , which is denoted by 
{e M }£i. Then, for every coordinate index i = 1,. . . ,n t the vector y t +se tt i 
is an element of i? 2 e( 2 /t)- Consequently, the real function 



<,! (V L ) ■■= ^\{cZ(rf),y t + ee M ) + a^rf) 
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is globally continuous and convex on a convex neighborhood of 0 l . Using the 
definitions 



K. 



*+ 



*+ 

Ha i • 






1 * — . / * 
and n t .= (K t 



,4 )G 



we can rewrite the vector of objective function coefficients as c* = — k*~ ■ 

This implies that c\ is d.c. on a convex neighborhood of O l , and thus (F2) 
is established. In a next step we consider the constraints. By the regularity 
condition (E3), which is assumed to hold true in the present context, the 
constraint function (5.20) is regularizable in the sense of definition 5.7. Thus, 
proposition 5.8 guarantees that the involved recourse and technology matrices 
are constant, whereas the random vectors h\ e , h,f e . and hf 1 are d.c. on a 
convex neighborhood of E 1 '. Finally, we have to prove that the recession cone 
{x^WtXt^t®} °f the feasible set X t (x t ~ 1 ,^) contains only the zero vector 
for all outcome and decision histories (x t_1 , rf , £*) in some neighborhood of 
Z l . This follows immediately from boundedness of the feasible set mapping 
Xt on a neighborhood of Z t . Consequently, (F3) is established. In summary, 
one may conclude that the linear stochastic program (5.18) complies with all 
of the regularity conditions (F1)-(F5). □ 

By proposition 5.21, the theoretical framework of Sect. 5.3 applies to any 
linear stochastic program of the form (5.18) which satisfies the regularity 
conditions (F1)-(F5). This implies, among other things, that the discrete 
bary centric measures P l and P u as well as the value functions $t, $ l t , and 

are well-defined, and the static and dynamic versions of (5.18) are equiva- 
lent and solvable. Moreover, the optimal value of (5.18) has computationally 
accessible bounds, and the recourse functions <Zy of the individual decision 
stages are squeezed between the auxiliary value functions <P l t and shifted 
by specific random variables. In order to construct these random variables, it 
is necessary to identify suitable correction terms of the form (5.15). To this 
end, we consider the dynamic version of the linear stochastic program (5.18). 
Then, the parametric stage t problem is given by 



sup (5-26) 

as t €R n ‘ 

s.t. Wti&xt + Ttiftxt-whttf), 



where 



_ /_t >tx ._ / (<£(»?*). x t ) + (E t 0 t+ !)(**, * 7 *, £*) for t < T, 
qt{x,V ,«)-\(c 5 ,(^), a . T ) for t = T. 

Notice again that this representation explicitly deals with inequality (less-or- 
equal, greater-or-equal) and equality constraints. Let 

*opt.t (x*- 1 , »!*,**) C R"‘ and D* pt t (x t ~ 1 ,r] t ,£ t ) C R 1 "*” 
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be the primal and dual solution sets associated with the parametric opti- 
mization problem (5.26), respectively. Based on the reasoning of Sect. 5.2 
and before, one can show that the multifunctions X opt%t and D* ptt are non- 
empty- valued, bounded, and use on a neighborhood of Z 4 . Thus, there are 



nonnegative bounding vectors Xf,X~ t 


- g K nt and D* t + ,D* t ~ 


eff 

€ M r * such 


that 


-X- <x t <X+ 


Vx t e X opt , t (Z 4 ) 


(5.27a) 


and 


~d\- < d\ < d;+ 


Vd* G I?: pt , t (Z 4 ). 


(5.27b) 



Without loss of generality, we may assume that (5.27a) not only holds for all 
optimal decisions in X opt ,t(2 t ) but also for all feasible decisions in X t (Z t ). 
Otherwise, one can add deterministic constraints to the parametric optimiza- 
tion problem (5.26) which guarantee the strict inequalities (5.27a) for all 
Xt G X t (Z t ) and which are non-binding at the optimum for any choice of 
the parameters in Z 4 . Slackness implies that these additional constraints have 
no influence on the solution of (5.26) and may thus be incorporated without 
concern. By means of the bounding vectors for the primal and dual solutions 
it is possible to define appropriate correction terms a° and a\ as in Sects. 5.1 
and 5.2, respectively. 

a*V) := +«~(r 1 t ),Xt) + «V)>*r> (5.28a) 

«?(€*) := ~(D; + , Kt(?)) - (D* t -, «*-(€*)> (5.28b) 

These definitions reflect the intrinsic primal-dual symmetry of linear (stochas- 
tic) programs. Obviously, the correction term a° associated with the non- 
convexities in the objective function has the same general structure as the 
correction term a\ corresponding to the nonconvexities in the constraints. 
Concretely speaking, (5.28a) pairs the bounding vectors of the primal solu- 
tions with the d.c. components of the objective function coefficients, whereas 
(5.28b) pairs the bounding vectors of the dual solutions with the d.c. compo- 
nents of the rhs vector. 

In proposition 5.21 we have shown that a° is in fact a suitable correction 
term in the sense of definition 5.1 - remember that, without loss of general- 
ity, the estimate (5.27a) may be assumed to hold for all feasible decisions in 
X t (Z l ). Furthermore, it is obvious that a) is of the form (5.6). Thus, it is a 
valid correction term in the sense of Sect. 5.2. Summing up both correction 
terms as in (5.15), one obtains an overall correction term at := a°-\-a\. Next, 
introduce the conditional correction terms as in (5.16) and (5.17). Then, the 
conclusions of theorem 5.16 hold true, i.e. we have 

<p[ + A\ - A t < <P t < <&t + A™ — A t on Z 4 for all t € r (5.29a) 
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and 

(E%) + ( E l A l 0 ) - ( EA 0 ) < (£<P 0 ) < (£“<^) + - {EM). (5.29b) 

By (5.28) the random variables A t , A l t , and Af are linear functionals of the 
bounding vectors for the primal and dual solution sets. Continuity then implies 
that (5.29) even holds if the strict inequalities in (5.27) are replaced by ordi- 
nary inequalities. Finally, it is also clear that the conclusions of theorem 5.17 
remain valid, i.e. the upper and lower bounds on the recourse functions can 
be made arbitrarily tight by suitably refining the barycentric probability mea- 
sures. 



5.5 Bounding Sets for the Optimal Decisions 

In Sect. 5.3 we found computable bounds bracketing the optimal value of a 
stochastic program which satisfies the regularity conditions (E1)-(E5). Recall 
that these conditions are weaker than any other set of regularity conditions 
considered in the present chapter. Below, we will argue that bounds on the 
optimal value also entail bounding sets for the optimal first stage decisions. 
As in Sect. 4.6, we explicitly work with the refinement parameter J £ N 
and use the standard assumption that the random outcomes at stage 0 are 
deterministic. Thus, we consider (r/ 0 , Co) as a fixed parameter in the following 
exposition. Given any stochastic program subject to the conditions (E1)-(E5), 
we may introduce three extended-real-valued functionals 

E(xo) -=po{xo, rjo, Co) + ( E $i)(x 0 ,r) 0 ,£o), 

Fj( x o) := po{xo, r]o, Co) + (^^i,j)(®o ! ’ 7 o,Co) + ^o,j(^o,Co) - A 0 (t 7 o,Co), 
Fj{x 0) := A>(*o>»?o,Co) + (Ej$i'j)(xo, 770, Co) + (770, Co) - A 0 (t 7 o,Co)- 

Then, the optimal value of the stochastic program under consideration is 
given by maxF, while the set of optimal first stage decisions coincides with 
argmaxF. By theorem 5.16 we find F l j < F < Fj on M n °. Thus, the func- 
tional F is squeezed between the auxiliary functionals Fj and Fj on the entire 
underlying space. This observation implies that the maximizers of F are nec- 
essarily contained in the set of all xo such that Fj(x 0 ) is greater or equal to 
the maximum of the lower bounding function Fj, i.e. 

argmaxF C Cj := {xo I Ej(x 0 ) > sup Fj}. 

Moreover, theorem 5.17 implies that the sequences {F l j}j e n and {Ej}j e N 
converge to F uniformly on its effective domain. Then, in analogy to theo- 
rem 4.9, we derive the following convergence result. 




112 



5 Extensions 



Theorem 5.22. Consider a multistage stochastic program subject to the reg- 
ularity conditions (E1)-(E5). Then, the sequence of upper level sets {Cj}j € n 
converges to argma xF provided that the underlying refinement strategy is 
regular in the sense of (4.37). 




6 



Applications in the Power Industry 



Problems of power systems planning and operation as well as energy trading 
are often addressed with methods of stochastic programming. A literature re- 
view of relevant work in this field is provided in [106] . The existing (both static 
and dynamic) stochastic programming models can be classified according to 
their planning horizon. Long-term planning models deal with investments and 
typically have a horizon of up to 20 years. Medium-term planning is done over 
a range of 1 to 3 years and is concerned with resource management (mainly 
reservoir management). Short-term planning has a horizon of at most one 
week and typically deals with unit commitment and economic dispatch. 

A second possibility to categorize energy models in stochastic program- 
ming arises from the ongoing deregulation of electricity markets. In fact, one 
may distinguish cost minimization models for an entire system and profit 
maximization models for individual agents. In a regulated market one usu- 
ally adopts the perspective of a social planner. Then, the control objective 
is to meet demand at minimal cost with available generation equipment. In 
a liberalized market with a power exchange, however, it suffices to consider 
single agents, who choose their operating and trading decisions on the basis of 
price information. Consequently, it is not necessary to model the entire power 
system. All information about aggregate demand and supply as well as avail- 
ability of resources and network congestions, which is relevant for an agent at a 
specific location, is contained in the local electricity prices. In other words, the 
local prices completely reflect the state of the system, and additional random 
variables become obsolete. 

There exists a vast variety of solution methods to address the above opti- 
mization problems. For instance, stochastic dynamic programming (SDP) has 
been used for a long time to solve several types of energy models, cf. the sur- 
veys [101,112,113]. In SDP the dynamic program (3.2) is solved directly by 
backward recursion. Then, the value function is approximately evaluated at 
finitely many points and interpolated in between. Notice that this approach 
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is only implementable if the value function has few arguments. Otherwise, 
computational effort explodes by the curse of dimensionality. The limitations 
of SDP are discussed in [82]. One solution method which overcomes some of 
the deficiencies of SDP is stochastic dual dynamic programming (SDDP) due 
to Pereira and Pinto [81-83] - see also the case study in [65]. This particu- 
lar method is well suited for the solution of linear stochastic programs with 
non-anticipative constraint multifunction, provided that the expectation func- 
tional of any time stage is jointly concave in all of its arguments. Then, each 
expectation functional can be approximated by the lower envelope of finitely 
many hyperplanes. SDDP is mostly used for medium-term planning of reg- 
ulated hydro-thermal power systems. For completeness, it is worthwhile to 
mention that SDP can principally be applied to any (potentially nonconvex) 
stochastic model, whereas the use of SDDP is limited to convex stochastic 
programs with a special dependence on the random parameters. 

Short-term planning models are usually nonconvex as they include com- 
mitment decisions (binary ‘on-off’ decisions). Other integer variables might 
arise due to the precise modelling of nonlinear phenomena and technical 
restrictions. Stochastic models with commitment decisions are considered 
in [4,86,102-104]. However, for any reasonable discretization of the under- 
lying probability space, the associated mixed integer linear programs (MILP) 
become too large to be solved directly - even in the two-stage case. The stan- 
dard approach to overcome this difficulty involves Lagrangian relaxation of 
certain constraints and solution of the corresponding dual minimization prob- 
lem (see e.g. [21] for an intuitive explanation of the Lagrangian relaxation 
approach). Thereby, one exploits the fact that the dual objective function 
frequently decomposes into small subproblems which can be tackled with a 
common MILP solver. Moreover, the dual objective function is convex in the 
Lagrange multipliers, and thus the standard tools of convex analysis can be 
used for its minimization (most authors favor specific subgradient methods or 
proximal bundle methods [37,54,63]). Unfortunately, due to the integrality re- 
strictions there is a non- vanishing duality gap, which implies that the optimal 
value of the dual problem is merely an upper bound for the optimal value of 
the primal problem. Sometimes the duality gap can be shown to be small. 
Then, one attempts to find a nearly optimal primal solution by means of a 
suitable heuristics (see [22] for an analysis of the duality gaps correspond- 
ing to different relaxation schemes). In addition, the upper bounds obtained 
via dualization are often used in the bounding procedure of a branch and 
bound algorithm [15,77]; cf. also the slightly different approach in [68]. Notice 
that branch and bound algorithms principally find the primal optimum with 
arbitrary precision. 

As pointed out by Romisch and Schultz [99] , in typical energy models one 
can distinguish three types of constraints. Dynamic constraints describe the re- 
lation between decisions at different stages while non-anticipativity constraints 
couple the decisions associated with different scenarios. Furthermore, most 
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energy models consist of lower-dimensional subproblems which are loosely 
coupled. These subordinate models are termed components. For ease of expo- 
sition, one can think of the components as models representing isolated plants 
in a larger power system. Then, the constraints relating decisions of different 
components are referred to as component coupling constraints. Lagrangian 
relaxation of dynamic, nonanticipativity, or component coupling constraints 
allows for nodal, scenario, or component decomposition of the dual objective 
function, respectively [99]. In [22, theorem 4.1] it is shown that scenario de- 
composition outperforms nodal decomposition as it spot maalways leads to 
a smaller duality gap. Thus, most authors focus on scenario or component 
decomposition (see e.g. [14] for a direct comparison). 

Carpe and Schultz [15] propose a branch and bound algorithm based on 
scenario decomposition. This approach has successfully been applied to the 
solution of two-stage stochastic programs with integer requirements in the 
field of power production and trading [77]. On the other hand, Dentcheva, 
Groewe-Kuska, Romisch, et al. solve a multistage unit commitment problem 
by means of component decomposition [21,49,50,73,75,76,78]. Thereby, a 
suitable Lagrangian heuristics provides approximate solutions for the primal 
problem. A simplified model without commitment decisions is investigated 
in [98]. Furthermore, in [20] the direct solution by using a standard MILP 
solver is compared with the component decomposition method. 

There are many other applications of stochastic programming in the en- 
ergy industry. Among these, important contributions are due to Fleten, Wal- 
lace, and Ziemba, who present a portfolio model for a hydropower producer 
operating in a competitive electricity market [38,39]. This nonlinear pro- 
gram accounts for energy generation and a set of power contracts for delivery 
and purchase, including contracts of financial nature. The authors do not re- 
port on numerical calculations, but they expect that Bender’s decomposition 
method, SDDP, or a combination of different decomposition schemes will be 
best suited to solve their model. Moreover, Giissow [51] and Ostermaier [79] 
solve medium-term planning problems in a deregulated market by using the 
barycentric approximation scheme. 



6.1 The Basic Decision Problem of a Hydropower 
Producer 

Let us adopt the perspective of a hydropower producer in a liberalized elec- 
tricity market. Assume that this producer has the right to operate one single 
pumped storage power plant over a finite planning horizon. First, choose a 
finite number of points in time (i.e. decision stages) indexed by t € r, at 
which production and consumption decisions are selected. Define period t as 
the interval from time t to time t + 1. Then, the stage t decision vector reads 
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Xf . — (^selljt) *^buy,£? % store, t) 5 

and its components are measured in MWh. x se u,t denotes the energy gen- 
erated in period t while :rbuy, t stands for the energy purchased on the rket 
to pump water into the reservoir. For simplicity, all nonlinear effects due to 
water level fluctuations are neglected. Instead, it is assumed that the water 
level is constant, and the amount of energy stored at the end of period t, 
i.e. :r s tor e ,ti is proportional to the volume of water contained in the reservoir. 
Concretely speaking, x st0 ve,t coincides with the potential energy of the stored 
water multiplied by the generation efficiency. 

Using the above definitions, the owner of a pumped hydropower plant faces 
the following decision problem. 



max / ( Y] Vt(x S eii,t -Zbuy,t)VP(77,£) (6.1a) 

a=6A r 3(T +i) Jex~ K ^ ' 

S.t. 3?store,t ^store,t— 1 "f~ £p *£buy,i = 0 Vt € T (6.1b) 

0 ^ ^sell,t — ^sell,t I 

6 — ^buy,f — *^buy,t / Vt £ T (6.1c) 

—stored — ^-store.t T ®stor e,t ) 

Thus, the power producer maximizes the expected revenues from electricity 
generation subject to the energy balance equation (6.1b) and the capacity 
constraints (6.1c). As usual, the constraints are assumed to hold almost surely 
with respect to the probability measure P. However, the attribute ‘P-a.s’. 
is suppressed throughout this chapter in order to keep notation simple. rj t 
stands for the local electricity price, which is completely exogenous and can 
not be influenced by the power provider. On the other hand, U characterizes 
the amount of energy by which the reservoir content increases in period t 
due to natural water inflows. As outlined in the introduction, we assume 
that a producer in a perfectly liberalized market must not care about load 
demand and possible network congestions, since market clearing and stability 
of the transmission grid are regulated via local spot prices. Thus, we need 
no additional random parameters or supplementary constraints. For ease of 
exposition, suppose that the real-valued random variables rjt and follow 
first order autoregressive processes with correlated noise. 

m = HI + 6°, & = HI &_! + ej (6-ld) 

Notice that this specification covers the customary mean reversion processes. 
Besides the AR(1) coefficients H° and H\ , the distributional parameters of 
the serially independent disturbances (e°,£j) must be specified. We assume 
that follows a bivariate normal distribution truncated outside some 
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regular cross-simplex, which is large enough to contain all the data with high 
probability. The underlying normal distribution has mean pt t and covariance 
matrix E t . 



: = 




( At ) 2 ptAA\ 

\PtAA At ) 2 ) 



Any confidence level a c > 0 defines a confidence ellipse in the (e°, ejj)-space as 
the smallest Borel set containing the fraction 1 — a c of the probability mass of 
the underlying normal distribution. Then, choose the cross-simplex £ ° x £[ to 
be the smallest two-dimensional interval covering the 1 — a c confidence ellipse. 
Finally, the distribution of the disturbances (e°, ej) is obtained via truncation 
of the normal distribution on the complement of £ ° x £ r t and normalization. 
An elementary calculation shows 

A = [At - A n c, At + Ot ra c ] and £\ = [p T t - a\n c , At + A t n c ] , (6.2) 

where n c := (lna" 2 ) -1 / 2 . Truncation of the normal distribution is necessary 
since extreme inflow scenarios could interfere with the requirement for nonan- 
ticipativity of the constraint multifunction. Without truncation, the natural 
inflows would exceed x s tor e,t + T se u,t with nonzero probability. This in turn 
implies that the energy balance equation (6.1b) could not be satisfied with 
probability 1. A second reason for truncation is due to the fact that negative 
spot prices or inflows lack physical meaning and should therefore be excluded. 
Last but not least, remember that the barycentric approximation scheme ex- 
plicitly requires compact probability spaces. 



Table 6.1. Input parameters 



Param. 


Description 


Unit 


3? sell, t 


max. amount of energy sold on the spot market 


(MW h) 


2?buy,t 


max. amount of energy bought on the spot market 


(MW h) 


3? store, i 


max. reservoir content (reservoir capacity) 


(MW h) 


— store , t 


min. reservoir content (e.g. predetermined target value) 


(MW h) 


^store, — 1 


reservoir content at the beginning of period 0 


(MW h) 


€P 


energy conversion factor 


(-) 


H? 


AR(1) coefficient of the spot price process 


(-) 


HI 


AR(1) coefficient of the natural inflow process; 


(-) 


ho 


(deterministic) electricity spot price in the first period 


1 

1 

£ 

2 

OP 


?o 


(deterministic) reservoir inflow in the first period 


(MWh) 


At 


mean value of e° without truncation 


(€ MW -1 h -1 ) 


At 


mean value of e\ without truncation 


(MWh) 


A 


standard deviation of e° without truncation 


(fiMW-'h- 1 ) 


A 


standard deviation of e\ without truncation 


(MWh) 


pt 


correlation coefficient of e° and e\ without truncation 


(-) 


Ole 


confidence level for truncation 


(-) 
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With the aid of (6.2) we can in fact determine the marginal spaces 0 t and 
St covering the support of the spot price and the natural inflow in period t, 
respectively. To this end, rewrite 7] t and as a linear combination of the 
serially uncorrelated noise terms, i.e. 

t 

Vt = #l,t% + Y H s + l,t £ °s , where H s,t : = 

s — 1 



rfr — H s' for s < 

1 else, 



and 



& = o + E^+m^ where H s,t '■= { n *'T n* 

s=l 



for s < t, 
else. 



Under the reasonable assumption that the AR(1) coefficients in (6. Id) are 
nonnegative, we obtain 



6 t 

+ Y - CT s n c)’ H l,tV0 + Y H °+l.t(As + a °s n c) , 

5=1 5=1 

S t = \h {^ o + £ tf s r + u(M r s - ^«c), ^i r /o + Y H l+lM + °> c ) 



s= 1 



5=1 



The relevant input parameters for the optimization problem (6.1) are listed 
in Table 6.1. It is worthwhile to remark that the energy conversion factor 
ep < 1 can be expressed as the pumping efficiency multiplied by the generating 
efficiency and accounts for the fact that electrical energy is not storable. In 
other words, only the fraction ep of a certain amount of energy bought on the 
spot market can later be reused. The rest is dissipated in the equipment of 
the power plant. For real pumped storage power plants the conversion factor 
ep approximately amounts to 75%. 

It can easily be verified that the linear stochastic program (6.1) satisfies 
the regularity conditions (B1)-(B4). Moreover, assumption (B5) holds if the 
reservoir targets x stor e ,t and x store t as well as the confidence level a c for 
truncation of the normal distribution are chosen appropriately, which will 
be implicitly assumed in the following. These conditions guarantee that the 
barycentric approximation scheme provides arbitrarily tight upper and lower 
bounds on the optimal value. In the sequel, we develop specific generalizations 
of the basic model (6.1). 



6.2 Market Power 

In Sect. 6.1 it is assumed that the amount of energy supplied (or bought) 
by the hydro generator does not influence the spot price. Thus, in a perfectly 
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competitive energy market the spot price behaves like an exogenous stochastic 
process, implying that the generator has no market power. In an oligopolis- 
tic market, however, the price-taking assumption should be relaxed. Let us 
therefore assume that the electricity spot price is a monotonously decreasing 
linear affine function of the net energy production a; ne t,t := £ S eil,t — £buy,t- 

- = Vt (I Ct *£net,i) 

Here, S t denotes the spot price in period t contingent on the trading volume 
of electric energy whereas the random variable rj t characterizes a reference 
price, which is realized if the hydro power plant under consideration is shut 
down. This approach is consistent with the model developed in [5] . Using the 
terminology of microeconomics, St may be interpreted as an inverse demand 
function in an oligopolistic market [71, Chap. 12]. In fact, the output levels of 
the competitors can be viewed as exogenous random parameters which are im- 
plicitly taken account of in the reference price rjt- The nonnegative constant Ct 
is measured in (MWh) -1 and determines the inverse demand elasticity with 
respect to net production. It is assumed that cf ] > max{5; se n,t, Xbuy.i} such 
that the spot price remains positive for all feasible production and consump- 
tion decisions. Plugging St into the objective function (6.1a) we obtain 

f T 

/ S t (x aet ,t)xnet,t)dP(ri,€). (6.3) 

J&xE ' i=0 1 

Since c t > 0 we find that 

Pt(Xnet,t, Vt) = Vt (Znet.t - C t X„ et)( ) =: X] t ft(x net ,t) 

is a continuous saddle function being concave (quadratic) in x n et,t and convex 
(linear) in rjt on its natural domain. Due to the quadratic terms in the objective 
function, the hydro scheduling problem with market power does not represent 
a linear stochastic program. However, the conditions (B1)-(B5) remain valid, 
and the bary centric approximation scheme is applicable. In order to solve the 
discretized auxiliary stochastic programs (4.40) by means of an LP solver, the 
quadratic terms in (6.3) must be approximated by piecewise linear functions. 
To this end, we introduce a dummy variable :c auXi t and write 

ft{x n et,t) = sup { ■^au x,£ | *^au x,t — /t(*^net,t)} 

W SUp {x aux ,t | ^aux,i — “b /t, J,i ^net,t ; ^ ^ (b-4) 

/£,j(^net,£)> 

where Xj := {1, . . . , /( J)} is a finite index set, and 

ft,j,i e R Vt e T, J e N, * e Xj. 

In other words, ftj is given by the lower envelope of the linear affine functions 
+ f t ji x n et,t, l € 2j. The parameter J labels the different piecewise 
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linear approximations / t> j of the quadratic function f t . In the remainder, 
let us assume that the sequence {ft,j}je N converges to f t uniformly on the 
compact interval [— Xbuy,t, £ S eii,t]- 

Denote by (E l <P l 0 j) and (E u d>% 3 ) the optimal values of the hydro schedul- 
ing problem with market power if the quadratic functions ft are approximated 
by the piecewise linear functions / ti j and the original measure P is substi- 
tuted by the discrete barycentric measures P l and P u , respectively. 1 Notice 
that the difference between these optimal values can be made small by suit- 
ably refining the barycentric probability measures. Next, for a given tolerance 
e > 0 there is a parameter Jo (e) such that 

i/w-Z.I £ (t + 1 £ )£( „) Wer, J>J 0 (e) (6.5) 

uniformly on [— S'buy.ti^seii.t]- Then, theorem 4.7 together with an elementary 
stability result implies that 

(E l <P l 0 J ) -e< (E%) < (E0 O ) < (E u $ S) < (E u <P% t j) + e. 

Thus, the optimal value (E@o) of the hydro scheduling problem with market 
power has upper and lower bounds which are computationally accessible by 
means of an LP solver. Moreover, the bounds can be made arbitrarily tight 
by refining the discrete barycentric measures and by improving the piecewise 
linear approximations of the quadratic terms in the objective function. 



6.3 Lognormal Spot Prices 

In the basic model (6.1) it is assumed that the spot price process is governed 
by a simple Brownian motion with mean reversion. As mentioned above, this 
specification allows the spot price to drop below zero if the involved nor- 
mal distributions are not suitably truncated. Moreover, the magnitude of the 
price fluctuations is independent of the current price level. However, actual 
electricity prices are nonnegative, and price fluctuations are expected to be 
higher when the price level is high. Geometric Brownian motion exhibits ex- 
actly these characteristics, and thus it is more adequate for modelling spot 
prices. Subsequently, we assume that the spot price at time t is lognormally 
distributed, St ■— exp(r) t ), where r] t follows an AR(1) process of the form 
(6. Id). 2 Then, the objective function can be rewritten as 

1 Here, the refinement parameter corresponding to the barycentric measures is 
suppressed. Otherwise, in order to avoid any ambiguity, the refinement parameter 
and the parameter labelling the piecewise linearization should be denoted by Ji and 
J 2 , respectively. 

2 Lognormal electricity prices with mean reversion have previously been studied 
by Deng et al. [19]. 
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r T 

(5Z exp(r] t ) x net ,t^dP(r],^), 



(6.6) 



where a; netjt := a: se ii,t — £buy,t as in the previous section. Unfortunately, the 
resulting stochastic program does not fulfill the regularity conditions (Bl)- 
(B5). In fact, for < 0 the stage t profit function is concave in r } t , which 
contradicts condition (B2). Consequently, the auxiliary value functions do not 
necessarily provide bounds for the optimal value of the recourse problem under 
consideration. However, it can easily be verified that the hydro scheduling 
problem with lognormal spot prices represents a linear stochastic program 
subject to the generalized regularity conditions (C1)-(C5). 3 In order to derive 
bounds on its optimal value, we first have to identify suitable correction terms 
{a t }t€r in the sense of definition 5.1. Since r/o is deterministic, a possible choice 



Mm) : = 



x net,t ex P (Vt) for t = 1, . . . , T, 
0 for t = 0, 



with X net, t > ^buy,t arbitrary. According to theorem 5.5, the bounds on 
(E&o) consist in the auxiliary value functions of the barycentric approximation 
scheme shifted by a combination of the expectation values (EAo), ( E l A l 0 ), and 
( E u Aq ). The auxiliary value functions as well as ( E l A l 0 ) and ( E U A% ) are con- 
veniently evaluated numerically. Moreover, notice that the expectation ( EA 0 ) 
with respect to the truncated normal distribution is analytically untractable. 
One can either evaluate (EAo) by using numerical integration, or - provided 
that the confidence level a c is small enough - one may neglect truncation and 
calculate the expectation with respect to the unrestricted normal distribution. 
To this end, rewrite rj t as a linear combination of the noise terms, i.e. 



Vt 



,tVo + Y, H °+i ,tC where 



Kt 



m 



S=1 



. H°, for s < t, 
else. 



Then, disregarding truncation, we obtain 



T 

(EA 0 ) = Y X net,t E MP(Vt)) 

t = i 




6.4 Lognormal Natural Inflows 

For similar reasons as in the case of the spot prices, it is meaningful to work 
with lognormally distributed reservoir inflows. Let us therefore assume that 

3 Of course, we could equivalently invoke the regularity conditions (F1)-(F5). 
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the natural inflows in period t are given by exp(£ t ), where £* follows an AR(1) 
process of the form (6. Id). This assumption requires a modification of the 
energy balance equation. 



store, t ^store,£— 1 T 3?sell,t Ep %buy,t — 0 (AT) 

It is straightforward to verify that the corresponding constraint function is 
nonlinear in U - an obvious violation of condition (B3). We may thus con- 
clude that the auxiliary value functions do not provide strict bounds on the 
recourse functions of the underlying stochastic program. All the same, one 
can verify that the hydro scheduling problem with lognormal inflows fulfills 
the generalized regularity conditions (D1)-(D5). 4 In order to employ theo- 
rem 5.14, let us look for a sequence of correction terms {at}tsr as in (5.6). 
Thus, study the dynamic version of the hydro scheduling problem at hand. 
Restricting attention to the parametric stage t subproblem, we should find an 
upper bound -D^ai t for the dual variable associated with the energy balance 
equation (6.7) uniformly over all outcome and decision histories in Z l . In eco- 
nomic literature, this dual variable is usually referred to as the shadow price 
of energy. As a matter of fact, it can be interpreted as the maximum price 
the hydropower producer would be ready to pay at time t for the use of a 
small additional amount of energy Aa; st0 re,t-i- This observation implies that 
the dual variable of restriction (6.7) is nonnegative since more energy in the 
reservoir necessarily results in higher profits. Without loss of generality, we 
may therefore interpret the energy balance equation (6.7) as a ‘less-or-equah 
constraint. A small additional amount of energy Ax storei( _i in the reservoir 
can either be sold on the spot market or - in times of scarcity - it can be used 
to meet the reservoir targets in the subsequent periods. In the former case, 
selling the extra amount of energy on the spot market results in an excess 
profit of at most ^max.t.T A:r s tore,t-i, where 

TJmax, t,T ■= max{ sup{rj s 6 6> s ) I s = t, . . . ,T}. 

In the latter case, the hydropower producer saves at most a lump sum of 
{Vmax.,t,T /ep) Ax store ,t— 1 - Notice that r/ max , tiT tends to infinity as the con- 
fidence level for truncation becomes small. Under the given assumptions, 
Wiax, t,T can be calculated analytically, i.e. it can be expressed as a func- 
tion of the basic input parameters in Table 6.1. By the above reasoning, any 
real number j t > r] ma x,t,T/cp represents a strict uniform upper bound for 
the shadow price of energy at time t. Being aware of this result, we are now 
prepared to define correction terms as in (5.6). Since the random variable £o 
is deterministic, a suitable choice is 

n(F )- = i ex P(&) for t = 1, . . . , T, 

' \ 0 for t = 0. 

4 Here again, we could equivalently work with the regularity conditions (F1)-(F5) 
for linear stochastic programs. 
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According to theorem 5.14, the bounds on (E$ o) consist in the auxiliary value 
functions of the barycentric approximation scheme shifted by a combination 
of the expectation values (EAo), ( E l A l 0 ), and ( E u Aq ). As already mentioned 
in the previous section, the auxiliary value functions as well as ( E l A l 0 ) and 
( E u Aq ) can be treated numerically. In addition, (EAo) can either be evaluated 
by using numerical integration or by neglecting truncation and calculating the 
expectation with respect to the unrestricted normal distribution. To this end, 
rewrite £t as a linear combination of the noise terms, i.e. 

t 

6 = H 1,A + E H l+l,t e l > where H lt '■ = 

5=1 

Then, in analogy to the previous section we find 

T 

(EA 0 ) = ~YD* b ti, t E(ex p(&)) 

t= 1 

T t 

« - E D & ,t ex p { H U & + E + 2 ■ 

t= 1 S = 1 



n s /= £ H\, for s<t, 
1 else. 



6.5 Risk Aversion 



A more general objective function than (6.1a) is 



leys 



T 

AT. 



Vt ^net 



t = 0 



,t)dP(v,i), 



(6.8) 



where U : R — > M is a utility function of von Neumann-Morgenstern type. 
Here, U is twice continuously differentiable and concave so as to reflect risk 
aversion. 5 The standard approach for firms to manage risk is not to use con- 
cave utility functions, but mitigate risk through net present value calculations 
under a risk-adjusted probability measure. However, the expected utility ap- 
proach is adequate if a firm has only few owners with limited opportunities 
to diversify. In order to keep notation simple, set x net ^ t x se \\,t — ^buy,t for 
all t £ t and x^ et := (:r n et,o> • • • i^net.T)- These definitions are consistent with 
previous conventions. Apparently, the objective function (6.8) can be inter- 
preted as the expected value of a sum of profit functions 

5 Utility functions of wealth accumulated over the entire planning horizon are 
e.g. used in [1], However, the present exposition could also be based on more general 
utility functions as in [38], provided that they are concave in the decision variables 
and sufficiently smooth. For instance, consumption-oriented models usually invoke 
time-separable utility. 
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Pt{x\rf) 



u{{n T A)) 

0 



for t = T, 

for t = 0, . . . , T — 1. 



This convention fits into the formal framework of Sect. 2.4. It can easily be 
checked that px is concave in x T for every fixed value of r) T and concave in 
r) T for all fixed values of x T . In other words, px is biconcave (but not jointly 
concave) in r] T and x T implying that condition (B2) is not fulfilled. Hence, the 
auxiliary value functions of barycentric approximation do not furnish bounds 
on the optimal objective function value. However, the hydro scheduling prob- 
lem with risk aversion represents a convex stochastic program which satisfies 
the generalized regularity conditions (C1)-(C5). This is a direct consequence 
of proposition 5.3. In order to find suitable correction terms {atjter as in 
definition 5.1, let us estimate the curvature of the profit function px . An 
elementary calculation shows (cf. also proposition 5.3) 

I|V„t ®V^p T (*W)l| 2 = linnet II 2 l^"((^ T »*ne t »l. 



where U" stands for the second derivative of the utility function U. Then, any 
strictly positive real number 

4 > SU P {ll^netll 2 l^ ,, (( r ? T > :r ne t })l | {x r ,r) T ^ T ) £ Y T } 

represents an upper bound for the curvature of px along rj T on a neighborhood 
of Y T . Thus, a possible choice for the correction terms is 




(rj T , rj r ) for t — T, 

0 for t — 0, . . . , T — 



1. 



As implied by theorem 5.5, the bounds on {E<Pq) are made up of the auxil- 
iary value functions and the expectation values (EAq), ( E l A l 0 ), and (E u Aq). 
The numerical evaluation of the auxiliary value functions as well as ( E l A l 0 ) 
and (E u Aq) poses no severe difficulties. In contrast, the expectation {EAq) 
with respect to the truncated normal distribution is analytically untractable. 
Thus, one can attempt to evaluate {EAq) by using numerical integration. Al- 
ternatively, one may neglect truncation and calculate the expectation with 
respect to the unrestricted normal distribution. To this end, express rj t as 
a linear combination of the noise terms, as exemplified in Sect. 6.3. Then, 
disregarding truncation, we obtain 



(EA 0 ) - E{a T {r} T )) 



T 

C T ^2 

t = o 






H°+ i, t c 



S=1 



S=1 



Notice that the hydro scheduling problem with risk aversion represents a con- 
vex stochastic program with nonlinear objective. In order to solve the asso- 
ciated auxiliary stochastic programs (4.40) by means of an LP solver, the 
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nonlinear terms in (6.8) must be approximated by piecewise linear functions. 
To this end, it is useful to introduce an additional decision variable x reVjt for 
the total revenues earned until the end of period t. The evolvement of x TeVyt 
is determined by the revenue balance equation 



Xr ev,t X rev> £_i TJf Xnet>,t 0 Vt £ T, 



(6.9) 



and initial wealth x rev ,-i is set to zero. For the further argumentation we need 
rough upper and lower estimates for the terminal wealth x rev ,r uniformly over 
all outcomes and all decision strategies feasible in (4.40). A suitable choice is 



— rev,T ' 
Xr ev,T — 




where rj maXyt := sup{r/ t £ & t }. 



With these conventions, the stage T profit function reduces to U(x reVy x) and 
can conveniently be approximated by a piecewise linear function. 



U(x T ev.T 1 ) = SUp {Xaux^ | S a ux,T 5: ^(x r ev,T)} 

S3 Slip {£aux,T | X a u x,T "E Pj,i + Ujj X re v,T> i G } (6.10) 

= * Uj(x reVi ' r) 

Here, x auXit is a dummy variable, Ij ■= {1, . . . , /( J)} is a finite index set, and 
Uj >u U' Jyi £ R VJ £ N, * £ Ij. 

Explicitly speaking, Uj is given by the lower envelope of the linear affine func- 
tions U j,i + Uji x rev ,T) i € I j. The parameter J labels the different piecewise 
linear approximations Uj of the concave utility function U. In the remainder, 
let us assume that the sequence {Uj} j £ n converges to U uniformly on the 
compact interval [x rev T ,x rev ,r]- 

In analogy to Sect. 6.2, denote by (E l $ l 0 j) and (E u @q j) the optimal 
values of the hydro scheduling problem with risk aversion if the nonlinear 
utility function U is approximated by the piecewise linear function Uj, and 
the original measure P is substituted by the barycentric measures P l and P u , 
respectively. 6 Next, for a given tolerance e > 0 there is a parameter Jo(e) such 
that 



\Uj-U\<e VJ > Jo(e) 

uniformly on [x rev r ,x re v,T]- Then, theorem 5.5 together with an elementary 
stability result implies that 

6 As usual, the refinement parameter corresponding to the barycentric measures is 
suppressed. Otherwise, for the sake of consistent notation, the refinement parameter 
and the parameter labelling the piecewise linearization should be denoted by Ji and 
J 2 , respectively. 
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(E% iJ ) + (E l A l 0 )-(EA 0 )-£ 

< (E%) + ( E l A l 0 ) - (EA 0 ) 

< (E$ o) 

< (E u $%) + (E U A%) - (EAo) 

< {E u $lj) + (E u A u 0 )-{EA 0 ) + e. 

Thus, the optimal value ( E<Pq ) of the hydro scheduling problem with risk 
aversion has upper and lower bounds which are computationally accessible by 
means of an LP solver. Moreover, the bounds can be made arbitrarily tight 
by refining the discrete barycentric measures and by improving the piecewise 
linear approximation of the nonlinear utility function. 

It is worthwhile to remark that the evaluation of (E l $ l 0 j) and (E u <Pq j) 
requires solution of a linear stochastic program with random recourse. In fact, 
notice that the revenue balance equation (6.9) is bilinear in the decision vari- 
ables and the stochastic parameters. Generally, stochastic recourse matrices 
can cause serious trouble (see e.g. [10, Sect. 3.1]), but in this special example 
they are apparently harmless. 

Finally, it should be emphasized that the generalizations of the basic de- 
cision problem (6.1), which have been discussed in Sects. 6.2 through 6.5, can 
easily be combined. However, the correction terms should not thoughtlessly 
be summed up. For instance, a hydro scheduling problem with a logarithmic 
utility function and lognormal spot prices can be shown - under certain cir- 
cumstances - to comply with the regularity conditions (B1)-(B5). Thus, no 
correction terms are needed. 



6.6 Numerical Results 

In the remainder we present numerical results of the models proposed in 
Sects. 6.1 through 6.5. Moreover, we will assess the performance of the newly 
developed bounding techniques by analyzing the gap between the upper and 
lower bound on the optimal objective function value. Any bounding technique 
performs well if the gap becomes small after only few refinements. The de- 
ployed algorithms were implemented in C+- 1-, and CPLEX 6.6. [16] callable 
library routines were used to solve the arising linear programs. Moreover, all 
computations were carried out on a 1 GHz Pentium III PC with 512 MB 
storage. 



6.6.1 Model Parameterization 

Consider a hydro scheduling problem with 6 decision stages indexed by t = 
0, . . . , 5. Each period between two subsequent stages comprises 28 days. Thus, 
we are facing a planning horizon of approximately half a year. 




6.6 Numerical Results 



127 



Table 6.2. System parameters 



Param. 


Value 


Unit 


Range 


X sell, t 


1.344E+05 


MWh 


Vt £ t 


%b\xy,t 


3.360E+04 


MW h 


Vt £ T 


3?stor e,£ 


4.380E+05 


MWh 


Vt 6 r 


— store, T 


3.500E+05 


MWh 




^store, — 1 


3.500E+05 


MWh 




€P 


7.500E— 01 







Table 6.2 lists the relevant technical data of the pumped storage power 
plant under consideration. Since all periods are of the same length, it is rea- 
sonable to assume that the limits on the energy exchanged on the spot market 
are equal in all decision stages. Moreover, it is intuitively appealing that the 
reservoir capacity x s tore,t is constant over time, and the amount of energy 
stored in the reservoir will never drop below 0. However, we require the reser- 
voir to be 80% full at the end of the last period. This is rather a regulatory 
than a technical restriction, and it takes account of the fact that the model 
has a finite horizon, whereas real operation of the plant has an indefinite hori- 
zon. Without such a restriction, too much energy would be sold in the last 
decision stage. See [48] for a survey of alternative approaches to mitigate this 
type of distortions, which are referred to as end effects in literature. 



Table 6.3. Distributional parameters for the reference problem 



Param. 


Value 


Unit 


Range 


H? 


2.608E— 01 




Vt € r_o 


HI 


2.608E— 01 




Vt £ t-o 


iff 


1.848E+01 


CMW-'h' 1 


Vt 6 T_o 


iff 


3.696E+03 


MWh 


Vt e T_o 


ff 


6.869E+00 


€MW _1 h -1 


Vt G T_o 


ff 


6.869E+03 


MWh 


Vt € T_o 


Pt 


— 5.000E— 01 




Vt € t-o 


Vo 


2.500E+01 


CMW-'h" 1 




Co 


5.000E+03 


MWh 




a c 


1.353E— 01 







Besides the system parameters we need to specify the distributional pa- 
rameters of the spot price and inflow processes; cf. Table 6.3. Notice that the 
increments of prices and inflows are negatively correlated. This assumption 
can easily be justified. In times of increased precipitation reservoir inflows are 
high, and hydro producers are forced to generate plenty of power in order to 
avoid spillage. Then, due to the abundant supply of electric power, spot prices 
will decrease. Furthermore, the confidence level specified in Table 6.3 entails 
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truncation of each noise term at its mean plus or minus two times its standard 
deviation, i.e. n c = 2 in (6.2). 

The lower bound on £store,T in the last stage induces nontrivial lower 
bounds on a: s tore,t in all other decision stages (for a discussion of induced 
constraints see Sect. 2.3). These lower bounds on the reservoir content can be 
defined recursively, i.e. 

21store,t— 1 := max {0,^ store t — £min,t — x buy,t + > t € T - 0- 

Thereby, £ m m,t : = inf{£t & Pt}, and 8 > 0 is a small energy shift which ensures 
the existence of Slater points in extreme scenarios. Notice that the hydropower 
producer can meet the prescribed target value Xgtoie^ with probability 1 if 
in every stage t < T the reservoir content lies above a^ toret . This might of 
course involve intensive pumping activity in certain scenarios. As 8 drops to 0, 
non-anticipativity of the constraint multifunction is preserved. However, for 
any admissible outcome and decision history with 25 s tore,t-i = 2-store, t-i an( ^ 
Q = £ m i njt there is only one single feasible decision vector, which is certainly 
no Slater point. To avoid this kind of problems, we fix 8 at a very small though 
strictly positive value in our calculations, i.e. <5=1.000E— 06 MWh. 

Note that the upper bounds on the reservoir content do not lead to induced 
constraints. This is a consequence of the fact that inflows & are much smaller 
than lE se u,t in each stage and every scenario for the given parameter setting. 



6.6.2 Discretization of the Probability Space 

In order to apply the barycentric approximation scheme, we have to find a 
suitable decomposition of the conditional probability P t for each t 6 r_ o, see 
Sect. 4.4. For simplicity, we first decompose the probability distribution Q t of 
the disturbances (e°, s\) as in (4.30). In particular, we represent Q t as a con- 
vex combination of truncated normal distributions with rectangular supports. 
Thereby, the supports of two different components are required to be essen- 
tially disjoint, i.e. their intersection must be a Qj-null set. Fig. 6.1 shows the 
supports of all components corresponding to four successively refined parti- 
tions of Qt', diagram (a) represents the trivial partition, whereas the diagrams 
(b), (c), and (d) visualize partitions with 2, 4, and 5 components, respectively. 
Note that the grey shaded area depicts the 87.47% confidence ellipse of the 
underlying normal distribution (a c = 13.53% <=>• n c = 2). The refinement 
strategy visualized in Fig. 6.1 was empirically found to perform well; however, 
other choices are possible. 

The decomposition of the probability measure Qt naturally induces a de- 
composition of the conditional probability Pt, each of whose components is 
of the form (4.31). Subsequently, we proceed according to the guidelines of 
Sect. 4.4. The components of a given P t -partition are discretized separately 
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M° 




Fig. 6.1. Refinement strategy for the reference problem. Time indices are suppressed 
for better readability 



and then combined to yield the barycentric transition probabilities. Notice 
that each component contributes two discretization points to the lower and 
upper barycentric transition probabilities, respectively. Thus, the partitions in 
(a), (b), (c), and (d) yield scenario trees with branching factors 2, 4, 8, and 10, 
respectively. The discrete barycentric measures are obtained by composition 
of the transition probabilities as in (4.36). 

Construction of the discrete barycentric probability measures requires eval- 
uation of the first order moments and cross moments of prices and inflows 
conditional on the outcome history. Since truncated normal distributions are 
analytically untractable, we use Mathematica 4.1 [72] to calculate these mo- 
ments by means of standardized numerical integration techniques. 



6.6.3 Results of the Reference Problem 



In a first step we study the basic hydro scheduling problem without compli- 
cating features. Table 6.4 contains the optimal values of the auxiliary stochas- 
tic programs associated with the discrete barycentric measures. As argued in 
Sect. 6.1, the optima of these auxiliary problems represent deterministic lower 
and upper bounds on the optimum of the original problem. By using the re- 
finement strategy of Fig. 6.1, the relative error between the bounds can be 
made smaller than 3%. Notice that the optima of the lower (upper) auxiliary 
stochastic programs are monotonously increasing (decreasing) as the barycen- 
tric measures are refined. This observation is consistent with the statements 
of proposition 4.6. 

Convergence of the bounds due to refinements is visualized in Fig. 6.2. 
Notice that, here, an increase of the branching factor from 8 to 10 improves 
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Table 6.4. Results of the reference problem 



Refinements 


Bounds 


Error 


Branching 
factor (-) 


(&* o) (€) 


(E“<P%) (€) 


abs. (€) 


rel. (-) 


2 


7.601E+06 


1.005E+07 


2.450E+06 


3.223E— 01 


4 


8.155E+06 


8.558E+06 


4.025E+05 


4.936E— 02 


8 


8.176E+06 


8.413E+06 


2.376E+05 


2.906E— 02 


10 


8.183E+06 


8.409E+06 


2.270E+05 


2.774E— 02 



accuracy only slightly. Intuitively, one would expect a significant improve- 
ment by doubling the branching factor. Then, however, the associated linear 
programs become too large to be solved. 




Fig. 6.2. Convergence of bounds due to refinements 



6.6.4 Hydro Scheduling Problem with Market Power 

Let us now investigate the decision problem with market power, and fix the 
inverse demand elasticity at c t :=2.121E— 06 MW _1 h“ 1 for all t £ r (see 
Sect. 6.2). With this choice, the output sensitive profit functions differ from 
the inelastic profit functions of Sect. 6.1 by up to 30%, which implies that 
the hydropower producer has significant market power. In order to solve the 
decision problem with elastic prices by means of an LP solver, we have to 
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linearize the quadratic functions f t as in (6.4). Here, we use a piecewise lin- 
earization ft , i with 5 segments, which are completely determined by the data 
in Table 6.5. Note that the parameter labelling this linearization is set to 
J = 1. An elementary calculation shows that f t and its approximation f t ,\ 
differ at most by 330 MW h for any admissible argument a; net ,t. Then, with 
T + 1 = 6 and E(r) t ) = 25€MW _1 h _1 for all ter, the tolerance e in the 
estimate (6.5) can be set to 5.000E+04€. 

Table 6.5. Piecewise linearization of the inverse demand function 



i 


ft, i,i (MW h) 


ft. i,t ( - ) 


i 


1.071E+00 


2.652E+02 


2 


9.288E— 01 


2.652E+02 


3 


7.863E— 01 


5.053E+03 


4 


6.438E— 01 


1.463E+04 


5 


5.013E— 01 


2.899E+04 



Table 6.6. Results of the hydro scheduling problem with market power 



Refinements 


Bounds 


Error 


Branching 
factor (-) 


(E l $ l 0tl )-e(€) 


i)+« («) 


abs. (€) 


rel. (-) 


2 


6.574E+06 


7.737E+06 


1.163E+06 


1.770E— 01 


4 


6.708E+06 


7.084E+06 


3.754E+05 


5.595E— 02 


8 


6.749E+06 


6.968E+06 


2.189E+05 


3.244E— 02 


10 


6.757E+06 


6.966E+06 


2.092E+05 


3.096E— 02 



By convention, (E l <P l 01 ) and (H u ^q,i) denote the optimal values of the hy- 
dro scheduling problem with market power if the quadratic functions {/t}tg r 
are replaced by {ft.ijter- and the original measure P is substituted by the dis- 
crete barycentric measures P l and P u , respectively. As pointed out in Sect. 6.2, 
the optimum of the decision problem with market power is squeezed between 
the optimal values of the auxiliary stochastic programs shifted by the toler- 
ance e. Table 6.6 contains the lower and upper bounds on the optimal ob- 
jective value. Moreover, Fig. 6.3 shows convergence of the bounds under the 
refinement strategy of Fig. 6.1. 



6.6.5 Hydro Scheduling Problem with Lognormal Prices 

Next, turn to the decision problem of Sect. 6.3. In this model rit is given a 
new interpretation as the logarithm of the spot price at time t. Nevertheless, 
in order to obtain prices of the same order of magnitude as in the reference 
calculation, the distributional parameters of the random variables {%}tg T 
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Fig. 6.3. Bounds in the presence of elastic prices 




should be suitably modified. The adjusted parameters are listed in Table 6.7. 
Recalibration of the stochastic processes goes along with the specification of a 
new refinement strategy; cf. Fig. 6.4. This strategy has empirically been found 
to be effective, and it almost coincides with the one of Fig. 6.1. 



Table 6.7. Distributional parameters in the presence of lognormal prices 



Param. 


Value 


Unit. 


Range 


H° t 


2.608E— 01 




Vt € r-o 


HI 


2.608E— 01 




Vf G t_o 


M? 


2.048E+00 




Vt € r- o 


Mt 


3.696E+03 


MWh 


Vt G r-o 


<?t 


4.274E— 01 




Vt G t_o 




6.869E+03 


MWh 


Vt G t - o 


Pt 


— 5.000E— 01 




Vt € t_ o 


Vo 


3.219E+00 






£o 


5.000E+03 


MWh 




Qc 


1.353E— 01 







As argued in Sect. 6.3, the determination of bounds on the optimal objec- 
tive function value involves solution of the auxiliary stochastic programs (4.40) 
as well as calculation of the expectation values ( EAq ), ( E l A l 0 ), and (E u Aq). 
The auxiliary stochastic programs can be tackled in the usual way by means 
of an LP solver, and their optimal values are listed in the upper panel of Ta- 
ble 6.8 (recall that we work with the refinement strategy of Fig. 6.4). Since the 
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M° 




Fig. 6.4. Refinement strategy for the hydro scheduling problem with lognormal 
prices. Time indices are suppressed for better readability 



barycentric measures P l and P u are discrete, evaluation of ( E l A l 0 ) and ( E u Aq ) 
reduces to the calculation of two finite sums; the numerical values in the mid- 
dle panel of Table 6.8 are based on the choice t :=3.360E+04MWh for 
all t € t. In contrast, evaluation of (EAq) requires integration of the correc- 
tion terms with respect to the original probability measure. By using serial 
independence of the disturbances {£°}te T _ 0 > we may write 

t t 

( EA 0 ) = y X~ et it exp(ff° t t]o) n^(exp(ff s ° +1)t £°)). 

t= 1 s=l 

Then, expectation of the exponentiated disturbances under the truncated nor- 
mal distribution is calculated numerically with the aid of Mathematica. For 
the given parameter values we obtain [EAq) =2.977E+06€. 

The bounds on the optimal value of the original problem can now be 
calculated by suitably combining the optimal values of the auxiliary stochastic 
programs and the expectation values ( EA 0 ), ( E l A l 0 ), and (E u Aq)] cf. the lower 
panel of Table 6.8. Figure 6.5 visualizes the development of the bounds under 
the refinement strategy of Fig. 6.4. Observe that the bounds are still close to 
the optimal values of the auxiliary stochastic programs. 



6.6.6 Hydro Scheduling Problem with Lognormal Inflows 

This paragraph is devoted to the study of the decision problem with lognormal 
inflows, which has been described in Sect. 6.4. Thus, no longer represents 
the reservoir inflow in period t, but its logarithm. To obtain inflows of the 
same order of magnitude as in the reference problem, we have to recalibrate 
the stochastic process {£t}te r - Table 6.9 contains the adjusted distributional 
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Table 6.8. Results of the hydro scheduling problem with lognormal prices 



Refinements 


Optima of auxiliary 
stochastic programs 


Error 


Branching 
factor (-) 


(E l 0‘ o ) (€) 


{E u n) (€) 


abs. (€) 


rel. (-) 


2 


6.228E+06 


1.020E+07 


3.969E+06 


6.372E— 01 


4 


6.801E+06 


7.937E+06 


1.136E+06 


1.671E— 01 


8 


6.927E+06 


7.385E+06 


4.580E+05 


6.612E— 02 


10 


6.934E+06 


7.360E+06 


4.264E+05 


6.150E— 02 


Refinements 


Expected correction terms 


Error 


Branching 
factor (-) 


(E l A l 0 ) (€) 


(E“A5) (€) 


abs. (€) 


rel. (-) 


2 


2.776E+06 


3.924E+06 


1.149E+06 


4.139E— 01 


4 


2.885E+06 


3.176E+06 


2.903E+05 


1.006E— 01 


8 


2.925E+06 


3.025E+06 


1.004E+05 


3.432E— 02 


10 


2.928E+06 


3.017E+06 


8.921E+04 


3.047E— 02 


Refinements 


Bounds 


Error 


Branching 
factor (-) 


(E‘<Pq) + (E‘Ao) 
—(EA 0 ) (€) 


(£ u <^) + (E*A$) 
-( EA 0 ) (€) 


abs. (€) 


rel. (-) 


2 


6.027E+06 


1.114E+07 


5.118E+06 


8.491E— 01 


4 


6.709E+06 


8.136E+06 


1.427E+06 


2.127E-01 


8 


6.875E+06 


7.434E+06 


5.584E+05 


8.122E— 02 


10 


6.884E+06 


7.400E+06 


5.156E+05 


7.489E— 02 




Fig. 6.5. Adjusted bounds in the presence of lognormal prices 
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parameters, which will be used henceforth. Note that, here again, r]t stands 
for the spot price of energy at stage t, and not for its logarithm. 



Table 6.9. Distributional parameters in the presence of lognormal inflows 



Param. Value 


Unit 


Range 


IT? 


2.608E— 01 




Vi 6 T-o 


H? 


2.608E— 01 




vt e T_o 


Pt 


1.848E+01 


eMw^- 1 


Vi e t-o 


Pt 


7.934E+00 




Vi € T_o 


o'? 


6.869E+00 


eMW^hT 1 


Vi 6 T_o 


a\ 


1.566E— 01 




Vi e t-o 


Pt 


— 5.000E— 01 




Vi 6 T_o 


Vo 


2.500E+01 


€MW -1 h -1 




£o 


1.082E+01 






a c 


1.353E— 01 







We achieve good convergence behavior of the bounds on the objective value 
by using the refinement strategy of Fig. 6.6. Since the involved correction 
terms only depend on the logarithmic inflows, subdivision of the simplex £ T 
is necessary for convergence of ( E l A l 0 ) and ( E u Aq ) to the expectation value 
(EA 0 ) (see e.g. Fig. 6.6 (c)). 



P° 




Fig. 6.6. Refinement strategy for the hydro scheduling problem with lognormal 
inflows. Time indices are suppressed for better readability 



As usual, a powerful LP solver can cope with the auxiliary stochastic 
programs; the corresponding objective values are listed in the upper panel of 
Table 6.10. Furthermore, the expectation values (E l A l Q ) and (E u Aq) are easily 
evaluated as a weighted sum over the scenarios of the discrete measures P l and 
P u , respectively. The numerical values in the middle panel of Table 6.10 are 
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based on the choice D^ lt :=5.867E+01 €MW J h x , t G r. In fact, for the 
given confidence level a c =1.353E— 01, the constant t represents a strict 
uniform upper bound on the shadow price of energy at stage t (note that the 
spot price is always smaller than 44€MW -1 h -1 ; dividing this value by the 
energy conversion factor ep yields the proposed constant j t ). Evaluation of 
{EAq) is more complicated as it requires integration of the correction terms 
with respect to the absolutely continuous probability measure P. By using 
serial independence of the disturbances {££}ter_ 0 > we obtain 

t t 

(EA 0 ) = J] E(exp(Hl +ht eD). 

i=l s=l 

As before, expectation of the exponentiated disturbances under the truncated 
normal distribution is calculated numerically with the aid of Mathematica. 
For the given parameter setting we obtain (EA 0 ) = — 1.366E+07€. 



Table 6.10. Results of the hydro scheduling problem with lognormal inflows 



Refinements 


Optima of auxiliary 
stochastic programs 


Error 


Branching 
factor (-) 


(M) (€) 


(E u 0%) (€) 


abs. (€) 


rel. (-) 


2 


7.401E+06 


1.020E+07 


2.797E+06 


3.779E— 01 


4 


7.952E+06 


8.062E+06 


1.108E+05 


1.393E— 02 


8 


7.784E+06 


8.080E+06 


2.968E+05 


3.813E— 02 


10 


7.787E+06 


8.026E+06 


2.391E+05 


3.070E— 02 


Refinements 


Expected correction terms 


Error 


Branching 
factor (-) 


(E l A l 0 ) (€) 


{E^AD (€) 


abs. (€) 


rel. (-) 


2 


— 1.423E+07 


— 1.353E+07 


7.048E+05 


4.951E— 02 


4 


— 1.423E+07 


— 1.354E+07 


6.935E+05 


4.872E— 02 


8 


— 1.378E+07 


— 1.360E+07 


1.753E+05 


1.272E— 02 


10 


— 1.378E+07 


— 1.360E+07 


1.751E+05 


1.271E— 02 


Refinements 


Bounds 


Error 


Branching 
factor (— ) 


(E l & 0 ) + (E l A l 0 ) 
-{EM) (€) 


(E U $D + {E-AD 
-(EA 0 ) (€) 


abs. (€) 


rel. (-) 


2 


6.825E+06 


1.033E+07 


3.501E+06 


5.131E— 01 


4 


7.376E+06 


8.180E+06 


8.042E+05 


1.090E— 01 


8 


7.664E+06 


8.136E+06 


4.721E+05 


6.160E— 02 


10 


7.667E+06 


8.081E+06 


4.142E+05 


5.402E— 02 



The bounds on the true objective value, which can be calculated in the 
usual way, are listed in the lower panel of Table 6.10. Moreover, these bounds 
as well as the optimal values of the auxiliary stochastic programs are visualized 
in Fig. 6.7. 
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Fig. 6.7. Adjusted bounds in the presence of lognormal inflows 



Observe that the gap between the bounds decreases monotonously as the 
branching factor is swept. However, the difference between (E l <P (,) and (E u $q) 
is oscillating. This is clear evidence that the saddle structure of at least some 
recourse functions or is really lost in the present example. 

Otherwise, monotonicity should hold due to proposition 4.6. 



6.6.7 Hydro Scheduling Problem with Risk- Aversion 



Finally, let us study the decision problem with risk-aversion. For simplicity, 
we work with the parameter setting of the reference problem. Thus, {r?t } ter 
and {£ t}ter are interpreted as the spot price and the inflow process, respec- 
tively, with distributional parameters given by Table 6.3. Discretization of the 
probability space is based on the refinement strategy of Fig. 6.1. Moreover, 
let us assume that the decision maker’s attitude towards risk is characterized 
by a concave quadratic utility function of the form 



U(x rev.r) := U\ x T ev ,x - U q x revT with 



I U { := 1.700E+00€ _1 , 
\ Uq := 5.000E— 08€ -2 . 



By definition, U assigns a dimensionless utility index to every amount x rev ,T 
of wealth accumulated over the entire planning horizon (see also Sect. 6.5). 
Since the spot price of energy is smaller than 44€/MWh at any time, rough 
upper and lower estimates for total wealth x re v,T over all outcomes and all 
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decision strategies feasible in (4.40) are given by x reVi T '■= 3.548E+07€ and 
— rev,T := — 8.870E+06€, respectively. 



Table 6.11. Piecewise linearization of the utility function 



i 


Ui,i (-) U{ 


< (€-*) 


i 


Ui,i (-) 


ULi (€-*) 


i 


4.950E+06 


2.7 


9 


9.750E+06 


0.3 


2 


2.400E+06 


2.4 


10 


1.440E+07 


0.0 


3 


7.500E+05 


2.1 


11 


1.995E+07 


-0.3 


4 


0.000E+00 


1.8 


12 


2.640E+07 


-0.6 


5 


1.500E+05 


1.5 


13 


3.375E+07 


-0.9 


6 


1.200E+06 


1.2 


14 


4.200E+07 


-1.2 


7 


3.150E+06 


0.9 


15 


5.115E+07 


-1.5 


8 


6.000E+06 


0.6 


16 


6.120E+07 


-1.8 



In order to tackle the hydro scheduling problem with risk-aversion by 
means of an LP solver, we linearize the quadratic utility function U as in 
(6.10). Here, we choose a piecewise linearization U\ with 16 segments, each 
of which is determined by the data in Table 6.11. Notice that the parameter 
labelling this linearization is set to J = 1. A tedious though elementary calcu- 
lation shows that U\ differs from U at most by the tolerance e :=5.000E+04 
on the compact interval [x rev T ,x reVt T\- 




Fig. 6.8. Adjusted bounds in the presence of risk aversion 
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Table 6.12. Results of the hydro scheduling problem with risk aversion 



Refinements 


Optima of auxiliary 
stochastic programs 


Error 


Branching 
factor (-) 


(“) 


{E u $ g,i) (-) 


abs. (-) 


rel. (-) 


2 


9.979E+06 


1.182E+07 


1.838E+06 


1.842E— 01 


4 


1.046E+07 


1.079E+07 


3.342E+05 


3.196E— 02 


8 


1.046E+07 


1.069E+07 


2.297E+05 


2.196E— 02 


10 


1.048E+07 


1.069E+07 


2.067E+05 


1.972E— 02 


Refinements 


Expected correction terms 


Error 


Branching 
factor (-) 


(E l A l 0 ) (-) 


(E*AS) B 


abs. (-) 


rel. (-) 


2 


6.787E+06 


8.579E+06 


1.792E+06 


2.640E— 01 


4 


6.969E+06 


7.418E+06 


4.486E+05 


6.436E— 02 


8 


7.032E+06 


7.191E+06 


1.585E+05 


2.254E— 02 


10 


7.033E+06 


7.191E+06 


1.583E+05 


2.252E— 02 


Refinements 


Bounds 


Error 


Branching 
factor (-) 


(E l $‘ 0tl )+(E l A l 0 ) 
~(EA 0 )-e (-) 


(£“*o,i)+(£Mo) 
—(EAo)+£ (-) 


abs. (-) 


rel. (-) 


2 


9.537E+06 


1.327E+07 


3.730E+06 


3.911E— 01 


4 


1.020E+07 


1.108E+07 


8.828E+05 


8.656E— 02 


8 


1.027E+07 


1.075E+07 


4.883E+05 


4.756E— 02 


10 


1.028E+07 


1.075E+07 


4.651E+05 


4.522E— 02 



Denote by (E l $ l 01 ) and {E u ^>q 1 ) the optimal values of the hydro schedul- 
ing problem with risk-aversion if the quadratic utility function U is replaced 
by U\, and the original measure P is substituted by the discrete barycentric 
measures P l and P u , respectively. These optimal values can be calculated 
with a standard LP solver; cf. the upper panel of Table 6.12. As pointed 
out in Sect. 6.5, the optimum of the decision problem with risk-aversion is 
squeezed between the optimal values of the auxiliary stochastic programs 
shifted by a combination of the random variables ( E l A l 0 ), ( E u Aq ), (EAo), 
and the tolerance e. As usual, evaluation of ( E l A l 0 ) and (E u Aq) reduces to 
the calculation of two finite sums. The numerical values in the middle panel 
of Table 6.12 correspond to the correction terms proposed in Sect. 6.5 with 
:=1.084E+04MWh 2 € -2 . In contrast, evaluation of ( EA 0 ) is fairly in- 
volved as it requires integration of the correction terms with respect to the 
original probability measure P. Reexpressing the spot prices in terms of the 
disturbances {£°}t er _ 0 , we obtain 

t , t 2 

(EA 0 ) =4E £ + E H °+iA) 

t=0 ' s=l 



Thus, the expectation value ( EAq ) can be written as a linear combination of 
the first and second order moments of the noise terms, which are conveniently 
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calculated by means of Mathematica. Under the current parameterization we 
end up with (EAq) = 7.180E+06. Then, the bounds on the true objective value 
can be calculated according to the recipe of Sect. 6.5, and the corresponding 
results are listed in the lower panel of Table 6.12. Moreover, these bounds as 
well as the optimal values of the auxiliary stochastic programs are visualized 
in Fig. 6.8. 

As in the other examples of this section, a suitable refinement strategy can 
reduce the relative gap between the bounds approximately by a factor 10. 




7 



Conclusions 



7.1 Summary of Main Results 

In Chap. 2 we study general multistage stochastic programs with non-anticipa- 
tive constraint multifunctions. Some important terminology is introduced, and 
several basic results in the field of stochastic optimization are reviewed. We 
extend the standard terminology by defining the ‘natural domains’ { Y t }t^ T 
and {Z t } t £ T as specific coordinate projections of the graph of the constraint 
multifunction. This definition proves useful in many contexts and facilitates 
the formulation of precise statements. In addition, we establish the fundamen- 
tal regularity conditions (Al)-(A5), which ensure equivalence of the static and 
dynamic versions of a multistage stochastic program. 

Chapter 3 addresses convex maximization problems with profit functions 
being concave and constraint functions being convex in the decision variables 
for all fixed values of the random parameters. As a principal objective, we 
study the implications of some new regularity conditions (B1)-(B5), which 
can be shown to imply (A1)-(A5). If a stochastic program complies with 
these restrictive conditions, then its recourse functions are subdifferentiable 
and exhibit a characteristic saddle structure on a convex neighborhood of their 
natural domains. Our approach relaxes some standard assumptions, which are 
widely used in stochastic programming literature: 

• we require the profit and constraint functions to be saddle-shaped and 
convex, respectively, on a neighborhood of their natural domains (instead 
of the entire underlying spaces); 

• we mitigate the usual strict feasibility condition by allowing for linear 
affine equality constraints; 

• we require compactness of the natural domains {Y t } t£T instead of the level 
sets {lev<o ft}ter , which may even be unbounded here. 
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Restricting the convexity properties of the profit and constraint functions to a 
neighborhood of their natural domains has decisive technical advantages (see 
Chap. 5). In addition, the possibility to incorporate equality constraints into 
an optimization model is of vital importance for most real-life applications 
(see e.g. Chap. 6). 

Chapter 4 develops the barycentric approximation scheme as a classical 
bounding method for scenario generation. We recover the well-known result 
that the optima of the auxiliary stochastic programs associated with the lower 
and upper barycentric measures provide lower and upper bounds on the true 
optimum, respectively. These bounds can be made arbitrarily tight by means 
of suitable partitioning techniques. Here, we propose a partitioning scheme 
which differs from the standard approach in literature. In fact, we suggest 
partitioning of the conditional probability measures, instead of their supports. 
This procedure provides more flexibility for synthesizing scenario trees and 
circumvents the difficulty of handling ‘half open’ cross-simplices. Finally, we 
propose a new method for bounding the optimal policy. In particular, we 
construct a sequence of compact sets, each of which covers the optimal first 
stage decisions of the original stochastic program. This sequence is shown to 
converge to the true optimizer set. Our approach sharpens the classical epi- 
convergence results, as it provides deterministic bounding sets for the optimal 
decisions. 

Chapter 5 is devoted to the study of convex stochastic programs with a 
generalized nonconvex dependence on the random parameters. If the profit 
and constraint functions of some stochastic program are nonconvex in the 
random parameters, then the corresponding recourse functions generally fail 
to exhibit a saddle structure. Consequently, the barycentric approximation 
scheme can not be shown to yield bounds on the optimal value and the re- 
course functions. We prove that, under certain conditions, the saddle structure 
can be restored by adding specific random variables to the profit functions. 
These random variables are referred to as ‘correction terms’. Loosely speaking, 
the correction terms compensate the nonconvexites inherent in the underlying 
stochastic program, and they are required to be nonanticipative and indepen- 
dent of the decision variables. If such correction terms exist, then we may also 
infer the existence of bounds on the optimal value and the recourse functions. 
However, in contrast to the well-behaved problems studied in Chap. 4, the 
plain stage t recourse functions of the auxiliary stochastic programs no longer 
provide bounds on the stage t recourse function of the original stochastic pro- 
gram. Instead, a lower (upper) bound is given by the lower (upper) auxiliary 
recourse function shifted by the conditional expectation of the sum of all cor- 
rection terms from stages t through T. Thereby, conditional expectation is 
taken with respect to the difference of the lower (upper) barycentric measure 
and the original measure. The bounds differ little from the plain auxiliary re- 
course functions if the correction terms are close to bilinear functionals with 
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respect to the topology of uniform convergence or if the barycentric measures 
are close to the original probability measure with respect to the weak topology. 

We formulate suitable regularity conditions which guarantee the existence 
of correction terms and, in spite of potential nonconvexities, 1 the existence 
of bounds. If the nonconvexities occur in the objective function, then the un- 
derlying stochastic program is required to satisfy some generalized regularity 
conditions (C1)-(C5). While implied by (B1)-(B5), these new conditions allow 
for ‘regularizable’ profit functions, which need not be convex in the random 
parameters. Loosely speaking, a profit function is called regularizable if it can 
be transformed to a continuous saddle function on a convex neighborhood of 
its natural domain by adding a specific correction term. However, notice that 
the transformed profit function will generally fail to exhibit a saddle structure 
on the entire underlying space. It is also shown that the class of regularizable 
profit functions contains a large class of smooth profit functions. 

If the nonconvexities occur in the constraints, then we work with another 
set of generalized regularity conditions (D1)-(D5), which are implied by (Bl)- 
(B5), as well. These new conditions allow for ‘regularizable’ constraint func- 
tions, which need not be convex in the random parameters. Informally speak- 
ing, a constraint function is called regularizable if it can be transformed to a 
continuous convex function on a convex neighborhood of its natural domain 
by adding a nonanticipative random vector independent of the decision vari- 
ables. Such random vectors as well as specific bounding vectors for the dual 
solutions (i.e. the Lagrange multipliers) enter the definition of suitable cor- 
rection terms which - as pointed out before - reestablish the saddle structure 
of the recourse functions. It is shown that these correction terms always exist 
and that the class of regularizable constraint functions contains a large class 
of smooth constraint functions. 

If the nonconvexities occur both in the objective and the constraints, then 
we invoke a set of regularity conditions (E1)-(E5), which allow for both re- 
gularizable profit and constraint functions. Under these generalized regularity 
conditions, bounds on the optimal value and the recourse functions can sys- 
tematically be generated. Furthermore, the bounds can be tightened by means 
of the partitioning techniques described in Chap. 4. 

The full symmetry of the proposed bounding method is revealed when 
studying linear stochastic programs. Then, the profit and constraint func- 
tions are regularizable if and only if the objective function coefficients and 
the right hand side vectors are d.c. (i.e. representable as a difference of convex 
functions) while the recourse and technology matrices are deterministic. These 
requirements give rise to the regularity conditions (F1)-(F5). Moreover, the 



1 When talking about ‘nonconvexities’, here, we always refer to the nonconvex 
dependence of some mapping on the random parameters (and not on the decision 
variables) . 
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Fig. 7.1. Interdependence of regularity conditions: the set A comprises all stochas- 
tic programs subject to the regularity conditions (A1)-(A5), etc. Thus, all problems 
in A are well-defined and solvable. Moreover, the classical barycentric approxima- 
tion scheme is applicable for all problems in B, whereas the stochastic programs in 
C and D may exhibit certain nonconvexities in the objective and the constraints, 
respectively. E is the largest problem class for which the present work suggests tight 
bounds. Note that the set F only contains linear stochastic programs 



correction terms reflect the intrinsic primal-dual symmetry of linear (stochas- 
tic) programs. In fact, the correction terms associated with the nonconvexities 
in the objective pair specific bounding vectors for the primal solutions with the 
d.c. components of the objective function coefficients. Conversely, the correc- 
tion terms associated with the nonconvexities in the constraints pair specific 
bounding vectors for the dual solutions (i.e. the Lagrange multipliers) with 
the d.c. components of the right hand side vectors. 

In summary, Chap. 5 provides a useful recipe of how the saddle structure 
of the recourse functions can be reestablished in the presence of certain non- 
convexities: loosely speaking, it suffices to add suitable correction terms to 
the profit functions. After having found such correction terms, any bounding 
method for scenario generation (that requires the recourse functions to be 
saddle-shaped) can be used to calculate bounds. Although we usually work 
with the barycentric approximation scheme, other scenario generation tech- 
niques could principally be used (e.g. the method by Edirisinghe [31]). If 
the random parameters appear only in the objective or in the constraints of 
a stochastic program, then one might possibly find correction terms which 
render the recourse functions purely convex or concave, respectively. In this 
special case, the Jensen and Edmundson-Madansky-type inequalities could be 
used to derive bounds. 
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Chapter 6 presents exemplary real-life applications of the theoretical con- 
cepts developed in this work. First, we formulate the basic decision problem of 
a hydropower producer operating a pumped storage power plant. This linear 
model complies with the regularity conditions (B1)-(B5), and thus the classi- 
cal barycentric approximation scheme applies, yielding bounds on the optimal 
value. Next, we study specific generalizations of the basic decision problem. 
For instance, if the hydro generator has market power, then the objective 
function becomes quadratic in the decisions. Nevertheless, the model still sat- 
isfies the conditions (B1)-(B5), and the barycentric approximation scheme 
remains applicable. In order to solve the discretized auxiliary stochastic pro- 
grams by means of an LP solver, the quadratic terms in the objective are 
approximated by piecewise linear functions. In a next step, we assume that 
the electricity spot prices are lognormally distributed. This modification may 
lead to nonconvexities in the objective. However, we argue that the underlying 
model is linear in the decisions and fulfills the generalized regularity condi- 
tions (C1)-(C5). Consequently, application of the barycentric approximation 
scheme requires that suitable correction terms be added to the profit func- 
tions. Subsequently, we consider a model with lognormally distributed reser- 
voir inflows. This model suffers from nonconvexities in the constraints, but 
it is shown to comply with the generalized regularity conditions (D1)-(D5). 
Here, as well, the barycentric approximation scheme fails to provide bounds, 
unless suitable correction terms are introduced. Finally, we investigate a gen- 
eralized model which maximizes expected utility instead of expected profit. 
As we work with a concave utility function, which accounts for risk aversion, 
the objective is nonlinear in the decisions as well as the random parameters. 
However, the underlying model manifestly satisfies the regularity conditions 
(C1)-(C5). Thus, the nonconvexities in the random parameters are compen- 
sated by adding suitable correction terms, as a consequence of which the use 
of the barycentric approximation scheme is justified. To tackle the arising 
auxiliary stochastic programs by means of an LP solver, the concave terms in 
the objective are approximated by piecewise linear functions. 

In any of the above examples, numerical experiments show that the relative 
gap between the bounds can be reduced to a few percent without exploding 
the problem size. 



7.2 Future Research 

In the present work we develop bounds on linear stochastic programs whose 
objective function coefficients and right hand side vectors are d.c. in the 
stochastic parameters, while the recourse and technology matrices are de- 
terministic. This result proves satisfactory since any continuous function on 
a compact domain can be approximated to arbitrary precision by d.c. func- 
tions. However, the requirement that the constraint matrices be non-stochastic 
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is restrictive. Future research should be focused on finding ways to relax this 
condition. It is unsure whether this goal will be accomplished by further im- 
proving the bounding measure techniques considered in this work (see Sect. 1.2 
for a survey on classical bounding measure techniques). In fact, allowing the 
constraint matrices to depend on the random parameters can heavily disturb 
the saddle structure of the recourse functions, and such distortions are usually 
hard to deal with. 

There exist alternative methods which are likely to provide bounds on 
at least some of the stochastic programs studied in this work. Instead of 
bounds derived from distributional approximations, one could e.g. work with 
restricted-recourse bounds, which are proposed in [74] for linear two-stage 
stochastic programs. One could also invoke scenario tree generation by opti- 
mal discretization [85] to derive bounds on stochastic programs with locally 
Lipschitz recourse functions. Of course, several other approaches might be 
worth pursuing, as well. The methods proposed in this work as well as possi- 
ble alternative methods should be assessed as far as their scope, accuracy of 
the bounds, and computational effort are concerned. 

In Sects. 4.6 and 5.5 we suggest bounding sets for the optimal first stage de- 
cisions of a given stochastic optimization problem. Calculation of these bound- 
ing sets involves the evaluation of the level sets of some concave value function. 
This, in turn, basically requires solution of a vast number of stochastic pro- 
grams with slightly varying input parameters. The design and implementation 
of efficient algorithms to evaluate these compact (and a fortiori bounded) level 
sets is an important task to be addressed in the future. 




A 



Conjugate Duality 



In this appendix we recall some basic elements of conjugate duality theory due 
to Rockafellar [90] (see also [64, Sect. 2.3]). Important applications of conju- 
gate duality theory in the field of convex stochastic optimization are presented 
in [92-96] . The methods and terminology introduced here are of vital impor- 
tance for the derivation of various results in the subsequent appendices and 
in the main text. 

Consider an extended-real-valued function p : X — ► [— oo, oo] on a real 
topological vector space X. p is said to be a concave function if its hypograph 

hypop := {(x,a)\x £ X, a € I, a < p(x)} 

is a convex subset of the product space Kxl. The effective domain of a 
concave function is defined as 

domp := (:rjp(a;) > — oo}. 

By definition, domp is obtained by projection of hypop on X and constitutes 
a convex set. Furthermore, a concave function is called proper if domp 0 
and p{x) < oo for all cc £ X. 

An arbitrary function p : X — > [— oo, oo] is denoted upper semicontinuous 
(use) if hypo p is a closed set with respect to the product topology on X x R. 
The use hull use p is the smallest use function > p. By construction, we may 
conclude that 



hypo use p = cl hypo p. 

The concept of use functions allows us to generalize the closure operation 
known for sets. We define the closure cl p of an extended-real- valued function 
p through 

. „ ( use p if use p < oo Vx £ X, 

cl " == 1 oo else. 
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By convention, p is called closed , if p = cl/5. Moreover, the convex hull co p 
is the smallest concave function > p. The hypograph of co p consists of the 
convex hull of the hypograph of p and a certain set of boundary points (by 
definition, the ‘vertical’ fibres of a hypograph must be closed for all fixed 
vectors x E X). 

hypo cop = {( x,a ) E X x R | (x,/ 3 ) E cohypop V /3 < a} 

To the (real) linear space X is associated a dual linear space X* along with 
a bilinear form (•, •) : X x X* — » R. A topology on X is ‘compatible’ with this 
pairing if it is locally convex such that for each x* E X* the linear functional 
x i — * ( x*,x ) is continuous, and every continuous linear functional on X can 
be represented in this form for some x* E X*. Similarly, a topology on X* is 
compatible with the pairing if it is locally convex such that for each x E X 
the linear functional x* >-> (x*,x) is continuous, and every continuous linear 
functional on X* can be represented in this form for some x E X. It is assumed 
that X and X* have been equipped with compatible topologies with respect 
to the given bilinear form (a topology compatible with a given pairing always 
exists and can be constructed systematically). Then, the ( concave ) conjugate 
p* of an arbitrary extended-real-valued function p on X is given by 

P*( x*) := inf {(x*,aj) - p(®)}, (A.l) 

*6X 

and the biconjugate of p is defined as the conjugate of p*: 

p**(x):= inf {{x* ,x) — p*{x*)}. (A.2) 

«*€X* 

Theorem A.l. For any function p : X — > [—00, 00] the following hold: 

(i) p* is concave and closed; 

(ii) p** = cl cop. 

The proof of theorem A.l relies on a fundamental characterization of closed 
convex sets, as outlined in the following proposition. 

Proposition A. 2 . Let C be a subset of a locally convex real vector space X, 
and assume that C is contained in some closed half-space. Then, the intersec- 
tion of all the closed half-spaces containing C is clcoC. 

Proof. Every closed half-space containing C necessarily covers cl co C. Without 
loss of generality we may thus assume C to be a strict subset of X which is 
convex and closed. Moreover, we may require 0 / C since the claim is trivial 
otherwise. It remains to be shown that any such C can be expressed as the 
intersection of all closed half-spaces that contain it. To this end, choose a 
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vector x £ C. By a standard separation theorem for locally convex spaces 
(see e.g. theorem 3’ in [114, Sect. IV/6]) there exists a closed half-space H{x) 
such that x £ H{x) and C C H(x). Consequently, the intersection of all 
closed half-spaces covering C contains no points in the complement of C. □ 

Proof of theorem A.l. (i) By definition, p* is the pointwise infimum of a family 
of linear affine functions x* i— > (x*,x) — p(x), x € dom p. The hypographs 
of these functions are closed half-spaces in X x K; their intersection, i.e. the 
hypograph of p*, is a convex closed set. Therefore, the function p* is convex 
and use. As easily can be seen, p* coincides either with the constant function 
+ 00 , or p* is an use function which nowhere adopts the value +oo. This 
notion implies closedness of p*. Assertion (ii) is proved by reexpressing the 
biconjugate in terms of p. 

p**(x)~ inf \{x*,x) + sup (p(x) - (a;*, x)} } (A. 3) 

a:*€X* ( jfc e X J 

Since the supremum is defined as the least upper bound, we may write 

sup{p(a:) - (x*,x)} = inf{u € R | u > p(x) - (x*,x) Vx £ X}. (A.4) 

Substitution of (A.4) into (A.3) shows that p**( x) corresponds to the optimal 
value of a constrained minimization problem: 

inf u (A. 5) 

{u, 

s.t. u + (x* ,x - x) > p(x) Va: € X. 

Apparently, the mathematical program (A. 5) has an intuitive geometric inter- 
pretation: the feasible set consists of all (continuous) linear affine majorants 
U{x) = u + (x*,x — x) of the given function p, and the objective function 
can be written as U{x) (this implies p**{x) > p( x)). Equivalently, the hy- 
pograph of p** can be characterized as the intersection of all ‘nonverticaP 
closed half-spaces in X x M covering the hypograph of p (these nonvertical 
half-spaces correspond to the hypographs of the linear affine majorants of p). 
An elementary argument based on proposition A. 2 with C = hypo p implies 
hypo p** = cl co hypop. Hence we conclude that p** = clco/3. □ 

In the sequel we consider an abstract mathematical programming problem 
over a real topological vector space X. 

supp(a;). (A-6) 

xex 

The extended-real-valued objective function p : X — > [— oo, ocj is assumed to 
be concave and accounts for possible restrictions on x such that the feasible 
set of (A. 6) is given by domp (see appendix B for an example). The basic 
idea of duality theory is the embedding of problem (A. 6) into a family of 
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parameterized maximization problems. Thereby, the perturbation parameter 
d ranges over an additional real topological vector space O, and the perturbed 
objective function P : X x B — > [—00, 00) has the following properties 

(a) P(x, 0) = p(x) for all x <E X; 

(b) P(-) is concave in (x,d)\ 

(c) P(x, ■) is closed for all x£X. 

We define the optimal value function ^ : D — > [—00, 00] as 

$(d) := sup P(x,d). 

x£X 

By construction, the optimal value of the unperturbed problem (A. 6) coincides 
with ^(0). As outlined in the following proposition, the concavity assumption 
on P implies that the value function <P is concave, too. 

Proposition A. 3. <P is a concave function. 

Proof. Denote by E the projection of hypo P on D x R. The hypograph of <P 
coincides with E except for some special boundary points (the ‘vertical’ fibres 
of hypo<£ must be closed for all fixed vectors d £ B): 

hypo$ = {(d, a) £ O x K | (d,j3) £ E V/3<a}. 

By construction, the hypograph of the perturbation function P is convex, 
and since convexity is preserved under projections, E is convex, as well. This 
notion entails convexity of hypo<£, and therefore ( P is a concave function. □ 

Let us associate to the linear space D a dual space D* along with a bilinear 
form (•,•): D* x D — » K. It is assumed that D and O* have been equipped 
with compatible topologies with respect to the given bilinear form. Then we 
are prepared to define the Lagrangian function 

L(x, d*) '■= sup {P(x, d) - (d * , d) } = -P*{x,d*), (A.7) 

de D 

where P*(x, ■) is the conjugate of P(x, •). As usual, such a Lagrangian function 
can be utilized to define a primal maximization problem 

sup inf L(x,d*) (A. 8) 

x€X D* 

as well as a corresponding dual minimization problem 

inf sup L{x,d*). (A. 9) 

d*£ D* xex 
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The objective function of the primal problem (A. 8) reads 

inf L(x,d*) — inf (0, d*) — P*(x, d*) — P**(x, 0) = P(x, 0) = p(x). 

The third equality follows from proposition A.l, since P(x, ■) was assumed to 
be concave and closed. Therefore, the primal problem (A. 8) exactly coincides 
with the original optimization problem (A. 6). For the further study of the 
mathematical programs (A. 8) and (A. 9) we need a fundamental result about 
the interchangeability of ‘inf’ and ‘sup’ operators. 

Proposition A. 4. Consider an extended-real-valued function f defined on the 
Cartesian product of two arbitrary sets X and ¥, / : X x Y — ► [— oo, oo] . Then 
the following hold: 

(i) sup x€X sup„ 6 y /( x, y) = supygy sup xe x f{x, y); 

(ii) inf xe x infygY f(x,y) = inf„ € y inf x6X f(x, y); 

(Hi) sup^gx inf ye¥ f(x, y) < inf y€Y sup^x f{x, y); 

(iv) inf x€ x sup^gy f(x,y) > sup yeY inf xeX f(x, y). 



Proof. Assertion (i) basically relies on a sequential application of the involved 
‘sup’ operators: 

sup /( x, y) > f(x, y) Vy e Y, x € X 

y€Y 

=» sup sup f{x, y) > sup /( cc, y) V y S Y 

aGX ye¥ xSX 

=» sup sup f(x, y) > sup sup /( x, y). 

xeX ye Y y€Y xeX 

The converse inequality holds by symmetry, and thus the claim is proved. 
Statement (ii) is equivalent to (i). Moreover, assertion (iii) follows from an 
analogous argument: 

inf f(x, y) < f(x,y) Vy G Y, x G X 

ye ¥ 

=> sup inf f{x, y) < sup f(x,y) Vy e Y 

03SEX y£Y x£X 

=> sup inf f{x,y) < inf sup f{x,y). 

xf-X y£Y y£ Y xEX 

Due to lack of symmetry the reversed inequality is not always fulfilled. Finally, 
statement (iv) is equivalent to (iii). □ 

Let us now investigate the objective function a : B* — > [— oo, oo] of the 
dual problem (A. 9). 
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<r(cT) := sup L(x,dT) 
x£X 



= sup sup{P(cc, d ) — (d*,d)} 

xGX dGD 

= sup sup {P(x,d) — ( d*,d }} 
deD xex 

= sup{<£(d) — (d*,d)} 
dSD 

- -<r(d*) 



(definition of £,(•)) 
(proposition A. 4) (A. 10) 

(definition of ^(-)) 



Thus, the optimal value of the dual problem reduces to 

inf d(tf) = inf {<d*,0> - (d* )} = <r*(0). 

d*e D* d*€D* 

By theorem A.l, the inequality $ < clco<? = cl^ translates to the weak 
duality statement: 1 



sup p(x) < inf <r (d*). 
xex d*e»' 

If sup x p < info* <7, the difference inf/). <r — sup x p is referred to as the duality 
gap in literature. Problem (A. 6) relative to its embedding in P is called normal 
if sup x p = info* d, i.e. if strong duality holds. Moreover, problem (A. 6) is 
called stable if it is normal and the supremum in the dual problem is attained, 
sup x p — mine* d. 

In order to investigate the structural properties of the dual objective func- 
tion <7, it should be expressed in terms of the value function (cf. (A. 10)). 

<7 (d*) = sup{<f(d) - (dr, d)} (A. 11) 

d€D 

As in the proof of theorem A.l, the ‘sup’ operator can be eliminated such that 
(A. 11) reduces to a parametric optimization problem over R: 

<7(cf ) = inf {u\u+ (d*,d) > $(d) Wd G D} . (A.12) 

Notice that the feasible set of (A.12) is given by a family of parallel linear 
affine majorants U(d) = u + (d*,d) of the optimal value function (the 
‘gradient’ d* is fixed whereas the offset u represents a decision variable), 
and the objective function amounts to 17(0). Prom (A.12) it is evident that 
the choice of the embedding P (and the corresponding value function <P) 
substantially influences the dual objective function <7. It is also clear that 
the dual feasible set dom<7 is determined by the asymptotic behavior of <P. 
Proposition A. 5 characterizes the optimal solution set of the dual problem. 

1 Alternatively, from proposition A. 4 (iii) it is immediately clear that the supre- 
mum of the primal problem (A. 8) is smaller or equal to the infimum of the dual 
problem (A. 9). 
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Proposition A. 5. The following statements are equivalent: 

(i) d* is an optimal solution of the dual problem (A. 9); 

(ii) d* e 

Proof. Since <£**(0) is the minimal value of the dual problem, it is always true 
that ^**(0) < a(d*) for all d* G D*. Moreover, (A. 12) entails the following 
sequence of equivalent statements: 

$**(0) = d(d*) 

<^=> <T*(0 ) + (d*,d) >$(d) VdeD 
$**(0) + (d*,d) ><P**(d) VdG D 
«=> d* € <9<£**(0) 

Thus, every subgradient d* G d<£**(0) constitutes an optimal solution of (A. 9) 
and vice versa. □ 

Corollary A. 6. The dual problem (A. 9) is solvable if and only if T>** is sub- 
differentiable at the origin. 




B 



Lagrangian Duality 



Now we exploit the results of appendix A in order to develop a specific La- 
grangian duality scheme for constrained maximization problems over X = ]R n . 
In particular, let us study convex mathematical programs of the form 

sup p(x) (B.l) 

s.t. f(x) < 0 . 



The objective function p : M n — > [—00,00) of (B.l) is concave, whereas the 
vector-valued constraint function / : R" — > lR r is assumed to be (component- 
wise) convex. As usual, the constraints can be made implicit by passing over 
to an ‘effective’ objective function: 



p(x) = { p{x) f° T /(a:) ^ 0 ’ 

v —00 else. 



(B.2) 



By means of the effective objective function (B.2), the mathematical program 
(B.l) can be represented as an unconstrained optimization problem (this for- 
mulation is compatible with the conventions met in appendix A). For techni- 
cal reasons we require (B.l) to satisfy some regularity conditions. First, the 
feasible set X := {x\ f{x) < 0} must be a compact subset of int dorn p. Fur- 
thermore, we assume that p is continuous on int dom p, and there is a Slater 
point y £ X with the property f(y) < 0. Notice that these specifications do 
not allow for equality constraints. For didactic reasons, the investigation of 
equality constraints is postponed to the end of this section. 

As outlined in appendix A, the formulation of a dual minimization problem 
is based on a convex embedding of the primal problem (B.l) into a family of 
perturbed maximization problems. A convenient embedding is defined through 




p{x) for /( x) < d , 
—00 else, 



(B.3) 
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Fig. B.l. The value function is concave and bounded below on a closed ball V 
centered at the origin. As <£(0) is finite, it is geometrically clear that the graph of 
<H> over V lies within the shaded region and is also bounded above 



where the perturbation parameter d ranges over D = R r . In many applica- 
tions, the embedding (B.3) is particularly useful for sensitivity analysis. Some 
basic properties of the associated value function are summarized in the sub- 
sequent proposition. 

Proposition B.l. The value function <T(d) — sup x P(x,d) corresponding to 
the embedding (B.3) has the following properties: 

(i) T> is proper, concave, and finite on a neighborhood of the origin; 

(ii) T>{d) > <P(d) for d > d, i.e. is monotonous. 



Proof, (i) By construction, the embedding P is jointly concave in ( x , d), and 
thus proposition A. 3 implies that is concave. By the assumptions on the 
objective function and the constraints, it can easily be verified that T>(0) 
is finite. Moreover, the embedding P(y , •) is continuous on a closed ball V 
centered at d — 0 if the Slater point y is held fixed. Thus, is bounded below 
by m = inf{P(j/, d)\d € V} on V. In addition, concavity and finiteness at the 
origin entail that T> is bounded above by m = 2#(0) —to on V (cf. Fig. B.l 
for the geometric motivation behind this argument). Hence, is finite on 
V. Again by concavity, we find # < oo on R r , which implies properness, 
(ii) The value function $ grows with d since the restrictions are relaxed as 
the perturbation parameter is increased. □ 
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For later use, the effective domain of the value function D := dom^ has 
to be analyzed. 

Corollary B.2. The assertions of proposition B.l imply: 

(i) D is non-empty and convex (not necessarily closed); 

(ii) d £ D => d + R!j_ C D and d £ D c =>■ d + EH C D c . 

Since we concentrate on finite-dimensional problems, we do not have to con- 
sider paired topological spaces. By means of the inner product, any linear 
functional ip : R r — > E on the vector space of perturbation parameters can be 
identified with an element d* of R r , i.e. y'j(d) = (d * , d) = d* • di • With 

this convention the general definition (A. 7) leads to the classical Lagrangian 
function 

L( X d*) = { P ^ ~ ( d *’ •ft*)) for d * - °> 

^ \ +oo else. 

The primal problem associated with the Lagrangian (B.4) can be written as 

sup inf p(x) - (d*,f(x)), (B.5) 

a d*> 0 

whereas the dual problem reads 

inf sup p{x)-{d*,f{x)) 
d’> 0 * 

Apparently, the dual objective function amounts to 

& t d *\ = [ su Px P( x ) ~ (d*,f(x)) for d* > 0, 

' \ +oo else, 

and the dual feasible set is determined by D* := domd. 1 Little surprisingly, 
D* is a convex set. However, in contrast to the primal feasible set, D* is 
unbounded and not necessarily closed. Below, these results will be discussed 
in more detail. 

Proposition B.3. The dual feasible set D* is convex and unbounded. 

Proof. Since T* is a concave function, the dual feasible set D* = dom b — 
dom<£* is convex (cf. (A. 10)). In order to prove unboundedness of D *, we 
have to show in a first step that the effective domain of the value function is 
not R r . To this end we introduce the auxiliary function 

Tt is standard practice to define the effective domain of a convex function o as 
dom o' := {d*|cT(d*) < oo}. 



(B.6) 
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/( x) := max.{fi(x) \ i = 1 , . . . r} . 



Obviously, / is convex and continuous on the entire decision space. Compact- 
ness of X implies that d := min x€ x f{x) is a finite number. Thus we find 
that 

{x | f(x) < de} = 0 Vd < d, 

where e = (1, . . . , 1) is an r-dimensional vector with identical entries. By 
construction, the vectors de are no elements of D, for all d < d. Conse- 
quently, compactness of X ensures that neither the effective domain of the 
value function nor its closure coincide with the underlying space of perturba- 
tion parameters, i.e. cl D ^ R r . Next, choose an arbitrary vector do G (cl D) c . 
According to a standard separation theorem for convex sets, there exists a 
linear functional ip(d) = (g*,d — do) such that j/>(d) > 0 for all d G clD. Note 
that the vector g* G R r can be interpreted as the gradient of The further 
argumentation relies on the following characterization of the dual feasible set 
D* , which is motivated by (A. 12). 

D* = {d* G R r | 3u G R such that u + (d*,d) > <P(d) Vd G R r } ^ 0 

Thus, for every d* G D* there exists a real number u and a linear affine 
function U(d) = u + {d*,d) that majorizes the value function ^(d). By con- 
struction, the linear affine function U (d) + A ip(d) is a majorant of #(d) as 
well, for any A G R+. Consequently, its gradient d* + A g* is an element of 
D*, and the dual feasible set is unbounded since the parameter A may tend 
to infinity. □ 

The dual feasible set D* is not only unbounded, but it is not even nec- 
essarily closed. For example, consider a one-dimensional convex optimization 
problem whose objective function is given by 



p(x) 



-{ 



ln(x + 1) for x > 0, 
x else. 



Assume that there is a convex constraint function of the form f(x) ]a;| — 1. 
Thus, the feasible set X = [—1, 1] is compact and contains a Slater point. 
Moreover, p is concave, / is convex, and both are continuous. A straight- 
forward calculation shows that the standard embedding (B.3) leads to the 
following value function 



<P(d) = sup{p(r;) | /(:r) < d} = j ^ 
In a next step we determine the conjugate of 



■ 2) for d > — 1, 
else. 



d>*(d*) = inf{d* d — d>(d)} = 
d 



d* for 1 < d* 

1-2 d* + ln(d*) for 0 < d* < 1 
— oo else 
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Prom general theory we know that the dual feasible set D* is given by the 
effective domain of 4>* . and thus D* = (0, oo) is an open subset of R. 

It is worthwhile to remark that the dual feasible set differs from the set of 
all subgradients of 4>** only by specific boundary points. Formally speaking, 
the range of the subdifferential multifunction d<P** is defined as 

range d<P** := (J d<P**(d). 

d£U r 

Since 4>** is a closed proper concave function, the following identity holds: 

int D* = int dom 4>* C range d4>** C dom 4>* = D* 

(cf. [89, p. 227]). The analysis of the value function and its subdifferential 
multifunction not only yields a convenient characterization of the dual feasible 
set but also provides fundamental insights concerning solvability and stability 
of the primal dual pair of mathematical programs (B.5) and (B.6). 

Proposition B.4. There exists a convex neighborhood V of d— 0 such that 
the value function 4> is continuous and subdifferentiable on V. 

Proof. This proposition is closely related to theorem 3.18 and can be proved 
in a similar manner. By proposition B.l, 4> is proper, concave, and finite on 
a neighborhood V of the origin. Without loss of generality we may assume 
that V is open and convex. Thus, by [89, theorem 10.1], 4> is continuous on 
V. Moreover, [89, theorem 23.4] entails subdifferentiability of @ on V. □ 

As 4> is a proper concave function (see proposition B.l), we have cl 4> — 
use 4>. In addition, by continuity we find cl4> = 4> on V. This notion im- 
plies <P( 0) = sP**(0), and thus the optimization problem (B.l) with the em- 
bedding (B.3) is normal (i.e. strong duality holds). Notice that the primal 
problem (B.5) is solvable since X is compact and the objective function p is 
continuous on X. Moreover, the dual problem (B.6) is solvable as well since 
4>** — cl'? is subdifferentible at the origin (cf. proposition A. 6). Hence, the 
optimization problem under consideration is stable. 

If we allow for equality constraints in (B.l), the Lagrangian duality scheme 
developed above must be slightly modified. Let us therefore investigate a 
mathematical program of the form 

sup p(x) 

s.t. f < 0 (B.7) 

r q (x) = o. 

As before, the objective function p : R n — > [— oo, oo) of (B.7) is concave, 
whereas the constraint function f ln : R" — > R r ”’ belonging to the inequality 
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constraints is assumed to be convex. Moreover, the constraint function of the 
equality constraints / eq : R" — > R req is required to be linear affine. As before, 
we impose certain regularity conditions. Above all, the feasible set 

A := {x | / in ( x) < 0, f eq (x) = 0} 

must be a compact subset of intdomp. Furthermore, we assume that p is 
continuous on intdomp, and (B.7) satisfies Slater’s constraint qualification. 
Thus, the gradients of the equality constraints are linearly independent, and 
there is a Slater point y £ X which is feasible and strictly satisfies the in- 
equality constraints, i.e. f' n (y ) < 0 and f eq (y) = 0. Then, we may choose an 
embedding similar to (B.3) 

P(x, d in , d eq ) = { / ®(*) for f - rfin and = d<5q ’ (B.8) 

1 oo else. 

Apparently, the perturbation parameter (d m ,cf q ) ranges over the product 
space R r "' x R’" eq . Some essential properties of the associated optimal value 
function are provided by the following proposition. 

Proposition B.5. The value function T>(d m ,d eq ) = sup x P(x, d m , d eq ) cor- 
responding to the embedding (B.8) is proper, concave, and finite on a neigh- 
borhood of the origin. 



Proof. By construction, the embedding P is jointly concave in (a:, d‘ n , d eq ), 
and thus proposition A. 3 implies that # is concave. By the assumptions on the 
objective function and the constraints, it can easily be verified that #(0, 0) is 
finite. Moreover, due to Slater’s constraint qualification, there is a closed ball 
V centered at ( d m ,d eq ) = (0,0) and a continuous function x : V —> R n such 
that the graph of x is a subset of int dom p, 

f in (x(d in , d eq )) < d in , and / eq (x(d in , d eq )) = d eq . 

This is a direct consequence of proposition 3.2 applied to the parametric 
maximization problem sup^ P(x, d m , d eq ) at the reference point (d m ,d eq ) = 
(0, 0). By what has been said above, infy p o x is a lower bound for $ on the 
closed ball V. In addition, concavity and finiteness at the origin entail that 
<P is bounded above on V (use an analog argument as in proposition B.l). 
Hence, is finite on V. Then, by concavity, we find $ < oo on R r ' n x K r<iq 
which implies properness. □ 

Little surprisingly, the embedding (B.8) leads to the classical Lagrangian 
function 



L(x, d in *, d eq *) = 



P {x) - { d™,r{ X )) ~ (d eq * , f eq ( x )) d in * > o, 
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The primal problem associated with the Lagrangian (B.9) can be written as 
sup inf L(x,d m *,d eq *), (B.10) 

X (d in *,d eq *) 

whereas the dual problem reads 

inf sup L(x,d ln * ,d eq *). (B.ll) 

(d in *,d eq *) x 

As before, it can generally be shown that the dual feasible set is convex and 
unbounded (see also proposition B.3). By the above reasoning (see propo- 
sition B.5), the optimal value function <P is concave and finite on a convex 
neighborhood of the origin. Thus, A is continuous and subdifferentiable at 
( 0 , 0 ) implying strong duality. Moreover, the primal dual pair (B.10) and 
(B.ll) is stable. 

Notice that, by convention, the optimizers of the Lagrangian dual problems 
(B.6) and (B.ll) are usually referred to as Lagrange multipliers. 




c 



Penalty-Based Optimization 



In this appendix we develop a penalty-based formulation of the convex max- 
imization problem (B.l). Proposition G.l and corollary C.2 below are impor- 
tant for the proof of proposition 5.12 in the main text. Here, the regularity 
conditions of appendix B are still assumed to hold true. Let d* pt S K r be 
some minimizer of the dual problem (B.6) associated with (B.l), and choose 
an arbitrary bounding vector D* > d* pt . Then, we may introduce an equiv- 
alent unconstrained optimization problem, whose objective is concave on R n 
and continuous on int dom p. 

sup p{ x) - (£>*, [f{x)} + ). (C.l) 

X 

Notice that the objective function of (C.l) coincides with p on the original 
feasible set X, and any violation of the constraints is penalized proportionally 
to the bounding vector D*. 

Proposition C.l. The mathematical programs (B.l) and (C.l) have the 
same optimal values. 

Proof. Strong duality guarantees that 

inf sup p(x) — (d* , f(x)) (C.2) 

< 2*>0 * 

has the same optimal value as (B.l). By assumption, D* represents an upper 
bounding vector for at least some solutions of (C.2). Therefore, the following 
optimization problem is equivalent to (C.2). 

inf sup p(x) - (d*,f(x)) 

d* x 

s.t. D* > d* > 0 



(C.3) 
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The additional constraint D* > d* is redundant as it is non-binding in the 
optimum. Then, interchanging the ‘inf’ and ‘sup’ operators we obtain 

sup inf p(x) - {d*,f{x)} = sup p{x) - (D* ,[f(x)] + ). (C.4) 

x dl x 

s.t. D* >d*> 0, 

Note that (C.4) is equivalent to (C.3) by strong duality. 1 In summary, each 
of the optimization problems (C.2) through (C.4) has the same optimal value 
as the original problem (B.l). This observation completes the proof. □ 

Let us next discuss maximization problems of the form (B.7) with both 
inequality and equality constraints. Arguing as in the proof of proposition 
C.l, one can show that (B.7) is equivalent to 

sup p(x) - {D in *, [/ in (*)]+> - (£> eq *+, [f eq (x)} + ) (C.5) 

X 

Thereby, £> in * g R m and D eq * + , D eq *~ € R** 1 are arbitrary bounding vectors 
which satisfy the inequalities 

D in* > d m* ; D eq*+ > [ rf eq*j + , and £>eq*- > 

for some solution (d m *, d eq *) of the dual problem (B.ll). To simplify notation, 
one can reformulate (B.7) as 

sup{p(*)|/(*)<0} with /:=(/ in ,/ eq ,-/ eq ) (C.6) 

X 

and (C.5) as 

sup p(x) - (£>*, [/(*)]+) with D* := (D in *, D eq * + , D eq *~). (C.7) 

X 

Formally speaking, we may state the following corollary C.2, which generalizes 
proposition C.l by allowing for equality constraints. 

Corollary C.2. The mathematical programs (C.6) and (C.7) have the same 
optimal values. 

The general findings of this appendix are related to results on exact penalty 
functions for nonlinear programs [115]. A version of proposition C.l for linear 
programs can be found in [74]. 

1 Consider the embedding 

P(x,d) ~ p(x)-(D*,[f(x)-d] + ) 

and use an elementary argument to show that the corresponding value function 
d>(d) := supj. P(x,d) is subdifferentiable on the entire space. Thus, strong duality 
follows. 




D 



Parametric Families of Linear Functions 



In this appendix we derive a useful result about convex mappings which are 
linear affine in some of their arguments. Proposition D.l below helps us to 
gain deeper insights into the properties of an important class of constraint 
functions considered in section 5.2. 

Proposition D.l. Consider a vector-valued mapping w : R L — » R n and a 
real-valued mapping k : U L — > M. Moreover, assume that the function 

j R n x R L — » R 

1 ' l (*,0 ^ <«>(£)>*) + «(£) 

is finite and jointly convex in (cc,£) on a convex neighborhood o/(x,|). Then, 
w is constant and k is convex on a convex neighborhood of£. 

Proof. Without loss of generality we may assume that (x, £) = (0, 0). 1 Then, 
there are a tolerance e > 0 and two compact cubes 

U :={x G K n | Moo < e} and V := {£ € | < e} 

such that / is finite and jointly convex in (a?, £) on a convex neighborhood of 
UxV. Here, || • ||oo stands for the vector oo-norm. By fixing cc = 0, it can easily 
be seen that convexity of / requires k to be convex on a convex neighborhood 
of V. Moreover, w and k are necessarily Lipschitz continuous on V. For the 

1 If ( x , £) ^ (0, 0), we may set x' := x — x and £' := £ — |. Moreover, define 

/V,£') == <*>'(£'), *'>+k'(0, 

where u/(£') := «;(£' + £) and «'(£') := k(£' + £) + (iu(£' + £), x). This f is finite 
and convex on a neighborhood of the origin. If we can show that w' is constant 
and k is convex on a neighborhood of ff = 0, then we find immediately that w is 
constant and k is convex on a neighborhood of £ = £. 
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further argumentation it is convenient to introduce an extended-real- valued 
function which is convex on the entire underlying space. 

- / /( x >£) for ( x ’£) £Uxv, 

\+oo else. 

By construction, / is a proper convex function on M n x R L . Let us then 
calculate the convex conjugate of / with respect to the first argument. We 
find 



/*(®*, £) = sup (x*, x ) - f(x, £) 

X 

= f e II®* - «>(£)lli - «(£) for JeL, 

1 — oo else, 

where || • ||i stands for the vector 1-norm. Since / is convex on R" x R^, 
the conjugate /* is convex in x* for all £ G R L and concave in £ for all 
x* £ R n . Assume now that w is not constant on V. Then, there are two 
vectors £ 0 ,£! G V such that tu(£ 0 ) ^ «;(£-,). Moreover, we may introduce a 
linear affine function 

£ : [0, 1] — > V such that £(0) = £ 0 and £(1) = £ : . 

By construction, w o £ and k o £ are Lipschitz continuous, and the mapping 

■= /*(®*,£(s)) =e||®* - wo £(s)||i -so £(s) 

is concave on [0, 1] for all x * € R". Denote by D the set of all s G [0, 1] where 
both w o £ and k o £ are differentiable. The Lipschitz continuity guarantees 
via the theorem of Rademacher (see e.g. [36] for a modern proof) that D has 
Lebesgue measure 1. In addition, we may write 

to°£(s) = / w(u)du + iu(£o), 

Jo 

where w(s) is the first derivative of w o £ at s whenever s G D and zero 
otherwise. Analogously, we find 

k°£(s)= / k{u)du + k(£ 0 ), 

Jo 

where k(s) is the first derivative of k o £ at s for s G D and zero otherwise. 
Next, define Do as the set of all s G D where w(s) 0. As in(£ 0 ) ^ tt;(£i), 
we may conclude that Do has strictly positive Lebesgue measure. A fortiori, 
D 0 is non-empty. Choose So G D 0 , and set x* = wo£(sq). Then, the mapping 
<p can be expanded around so. 

<p{s) - <p(s 0 ) = £ ||m(s 0 )lli ]s - s 0 | - k(s 0 ) (s - s 0 ) + o(s - s 0 ) 
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By construction, ||tb(so)||i is strictly positive, which contradicts concavity of 
ip. Thus, our assumption is false, and w is constant on V . □ 

The assertions of proposition D.l can be proved much more directly if it 
is assumed that w and k are twice continuously differentiable on a convex 
neighborhood of £. In fact, by calculating the Hessian of / at an arbitrary 
point (x. £) G U x V, one can easily verify that the first derivative of w 
vanishes and the Hessian of k is positive semidefinite. Otherwise the Hessian 
of / would not be positive semidefinite at (x, £) implying that / would not 
be (locally) convex. 




E 



Lipschitz Continuity of sup-Projections 



In this appendix we study the parametric dependence of the Lagrange multi- 
pliers associated with a family of maximization problems. To this end, we need 
a specific result about Lipschitz continuity of sup-projections. This technical 
result is needed for the proof of proposition 5.12 in the main text. For the sake 
of transparent notation, here, we consider a simple parametric optimization 
problem over the Euclidean space R n 

#(*?, 0 = sup p(x, n, £) (E.l) 

x€M n 

s.t. f(x,£)<0 

which depends on the parameters rj £ ]R A and £ £ M L . The extended-real- 
valued objective function p is assumed to be concave in x. Moreover, the 
vector-valued constraint functions 




corresponding to inequality and equality constraints are continuous, and the 
combination 



/:=(/ in ,/ eq ,-/ eq ) 



is convex in x, where r = r in + 2 • r eq . The constraint functions define a 
closed- valued feasible set mapping 



£ X(£) := {x| /(*,£) < 0}. 

Next, let 0 C and S C be compact (not necessarily convex) sets, and 
set Z 0 x E. Denote by Y the graph of the feasible set mapping X over 
Z. Then, we require p to be a subdifferentiable saddle function on a convex 
neighborhood of Y being convex in 77 and jointly concave in (*,£). Moreover, 
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f is regularizable in the following sense: there is a continuous vector-valued 
function k : — > M r which is constant in x and rj, and both k and / + n 

are convex functions on a convex neighborhood of Y. It is convenient to write 
the mapping k as 

( K in • 

k = (K ln ,«: eq+ , K eq ~) where < /c eq + : M L — > M r<sq , 

[ K eq - : R l -> K rGq . 

This representation reflects the grouping of / = (/ m , / eq , — / eq ) by inequality 
and equality constraints. By applying proposition D.l in appendix D to each 
component of the regularized constraint function / eq + K eq+ and for every 
reference point (x,r],£) £ Y, we may conclude that 

n(x,o = wx-h(o, 

on a neighborhood of Y. Thereby, the matrix W is constant, and the mapping 
h can be represented as a difference of two convex functions. Without loss 
of generality one may assume that h = K eq+ — K eq ~ . Finally, we postulate 
that the parametric maximization problem (E.l) satisfies Slater’s constraint 
qualification for any (»?,£) € Z, and the multifunction X is bounded on a 
neighborhood of Z. 

In order to study the Lagrange multipliers associated with the explicit con- 
straints in (E.l), we should first consider a family of perturbed maximization 
problems 

<£'(d In , cf q ,T7,£) = sup p{x,ri,t) 

S.t. / in (cc, £) < d' n (E.2) 

/ eq 0r,O = d eq 

which depends on the perturbation parameters d™ € and d eq £ M r 1 
(see appendix B for an introduction to Lagrangian duality). It is useful to 
introduce a new feasible set mapping 

( d in , d eq , 4) ~ X'(d in , ef q , 0 := {x \ f n (x, £) < d in , / eq (x, £) = d eq } 

which depends both on ^ and the perturbation parameters. Next, define 

z’ := {(o,o)} xexsc ir in x K req x R k x R l . 

Moreover, denote by Y' the graph of the feasible set mapping X' over Z'. For 
the further argumentation we need some preliminary results about the new 
feasible set mapping. 

Proposition E.l. The feasible set mapping X' is non-empty-valued, bounded, 
and use on a neighborhood of Z' . 
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Proof. Slater’s constraint qualification implies via proposition 3.2 that the 
feasible set mapping X' is non-empty-valued on a neighborhood of Z' . Next, 
we prove boundedness. Let us introduce an auxiliary constraint function 

g(d m , d eq , x, £) :=max{max{/‘ n (x,£) -d‘ n ;* = l,...,r m }, 

max{|/f q (*,£) -d qq |;i = l,...,r eq }}, 

which is continuous and convex in x on the entire underlying space. Due to 
boundedness of X on a neighborhood of Z, there is a compact neighborhood 
V' of E and a positive real number R such that 

X(t) = {x\g(O,O,x,Z)<0}cB R V£gU', (E.3) 

where Br := {a: | ||x|| < R} represents the open ball of radius R around the 
origin. Then, the inclusion (E.3) and continuity of g imply that 

g{ 0,0,x,£) > £ > 0 V(x,£) e S R x V' , 

where Sr := {x | ||x|| = R}, and e is some strictly positive real number. By 
construction, the auxiliary constraint function g is uniformly continuous on 
any bounded set. Thus, there exists a neighborhood U' of (d in ,d eq ) = (0,0) 
such that 

<?(d in , d eq , x, C) > 0 V(d in , d eq ) € U', (®, 0 € Sr X V' . 

As implied by proposition 3.2, the feasible set mapping X' has locally a con- 
tinuous selector on Z' . Consequently, there exists a neighborhood V" of S 
and a neighborhood U" of (d ln , d eq ) = (0, 0) such that 

X'(d in , d eq , V(d in , d eq ) € U", £ € V". 

Setting U U' fl U", V V' fl V", and by using convexity of g in the 
decision variables we may conclude that 

X'(d in , d eq , €) = {*| g(d ia : d eq , x, c) < 0} C B r V(d in , d eq ) G U,i&v. 

This observation proves boundedness of the feasible set mapping X' on a 
neighborhood of Z' . Moreover, X' has a closed graph since the constraint 
functions are continuous on the entire space. Upper semicontinuity of X' on 
a neighborhood of Z' then follows from [13, proposition 11.9(b)]. □ 

Corollary E.2. For any neighborhood U ofY' there is a neighborhood V of 
Z' such that the graph of X' over V is a subset ofU. 

Proof. Use upper semicontinuity of X' as in proposition 3.10. □ 
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Proposition E.3. There are open bounded (not necessarily convex) sets 



Y n 


c 


K r ‘ n 


x R r ° q 


x r x i 1 


Yj 


c 

c 


r k 

M r ‘ n 


x R r ° q 


x R l 


Z u 


c 


R* 




> 



with the following properties: 



(E.4) 



(a) Y n x Ey is a neighborhood ofY'; 

(b) Z n x Zj is a neighborhood of Z' ; 

(c) p is a continuous saddle function on coin x coYu being concave in x, 
convex in r), and constant in (d m ,d eq ,£); 

(d) both f + k and n are continuous convex functions on co Y n x co Yu; 

(e) the graph of X' over Z n x Z u is a subset ofY n x Y u ; 

(f) the multifunction X' is non-empty compact-valued on Z n X Zy. 



Proof. By the general assumptions, it is easy to find open bounded sets Y n and 
Yj satisfying the requirements (a), (c), and (d). Furthermore, by proposition 
E.l and corollary E.2 it is possible to find open bounded sets Z n and Zu 
which fulfill the remaining requirements (b), (e), and (f). □ 

Proposition E.4. The value function T>' is Lipschitz continuous on a neigh- 
borhood of Z' . 



Proof. Define an auxiliary objective function 

{ p(x, 77 , £) on coY n x coYy, 

+00 on coY n x (co Yj) 0 , 

—00 everywhere else. 

By assertion (c) of proposition E.3, q is a saddle function on the entire un- 
derlying space. Moreover, introduce two auxiliary value functions. 

Q(d m , d eq , r], £) = sup q(x,ri,£) 
sueR" 

s.t. f in (x, £) < d in (E.5a) 



Q'(d in , d eq , rj, £) = sup q(x,r},£) 
s.t. /“’'(*,£) <d in 

r* / (*,o = d eq 



(E.5b) 
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Notice that Q' closely resembles Q. However, the potentially nonconvex con- 
straint functions in (E.5a) have been replaced in (E.5b) by 

/ in ’ / := / in + K m and / e<v := / eq + ft eq+ - K eq “ . 

By construction, / in,/ is jointly convex in x and £ whereas / eq ’' is linear affine 
in x and constant in £ on Y n x Y y . Next, define a homeomorphism 

t : (d in ,d eq ,t 7 ,£) -> (d in + K in (£),d eq + « eq +(£) - « eq “(£), rj, £). 

It can easily be verified that i relates the auxiliary value functions (E.5) 
through Q = Q' ol. This representation will be used to prove that Q is locally 
Lipschitz continuous 1 on Z n x Z\j. By the properties of the objective and 
constraint functions in (E.5b), the perturbed value function Q' is convex in rj 
and jointly concave in ( d m , d cq , £) on the entire underlying space (cf. also the 
related argument in the proof of theorem 3.14). Moreover, by the assertions 
(e) and (f) of proposition E.3, the parametric optimization problem (E.5a) is 
feasible for any parameter (d ln , d eq , r), £) £ Z n x Z u . Then, the Weierstrass 
maximum principle proves that Q is pointwise finite on Z n x Z^j. This in turn 
implies pointwise finiteness of Q' on t(Z n x Z u ), which is open, but generally 
nonconvex. As Q' is a saddle function, we may invoke [89, theorem 35.1] to 
show that Q' is locally Lipschitz on t(Z n x Z u ). By convexity of the mapping 
k, the homeomorphism i is locally Lipschitz on Z n xZ u . This is a consequence 
of [89, theorem 24.7]. Hence, the composition Q = Q' o i is locally Lipschitz 
on the open set Z n x Zy. 

By assertion (e) of proposition E.3, the optimal value function ( P' coincides 
with Q on Z n x Zy. Thus, ( P' is locally Lipschitz on Z n x Z tJ and globally 
Lipschitz on every compact subset of Z n x Z u . This observation completes 
the proof. □ 

Proposition E.5. The Lagrange multipliers associated with the explicit con- 
straints in the parametric maximization problem (E.l) are uniformly bounded 
for all (r),£) in a neighborhood of Z. 

Proof. As argued in proposition B.3, for any fixed parameter (rj. £) £ Z the 
dual feasible set of problem (E.l) is convex and unbounded. In contrast, the 
dual solution set, i.e. the set of Lagrange multipliers, is compact and reduces 
to the subdifferential of the perturbed value function 'A ( ■ , ■, r], £) evaluated at 
(d in , d eq ) = (0,0). 

D* 0 ptfa.O := x 9 doq A(d i ",d eq ,r,,£)| (din deq)=(00) 

1 The Lipschitzian properties of general inf-projections are analyzed by Wets 
[109]. Here, however, Lipschitz continuity of the sup-projection Q can directly be 
established by exploiting the regularizability of the constraint functions and the 
saddle structure of q. 
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By proposition E.4, the value function 3' is Lipschitz continuous on a neigh- 
borhood of Z' with some constant A. Thus, the following estimates hold 

D in * > d in* > 0 I 

D e q *+ > d e q * > _ D e Q *- j V (d'“*,d^*) € D* opt (V , €), (E.6) 

where 

D m * := (A,..., A) and D eq *+ := D eq *- := (A,..., A). 

' -V- ' ' v / 

r in times r eq times 

Consequently, the set of optimal dual solutions D* pi (r),£) is uniformly boun- 
ded on a neighborhood of Z. Thus, the claim follows. □ 

Finally, we state a technical result which is needed in the proof of proposi- 
tion 5.12 in the main text. Assume that p is an alternative objective function 
valued in the extended reals. Moreover, assume that p is concave in the deci- 
sion vector x, and consider the parametric optimization problem 

$(v, €) =■ sup p(x, » 7 , £) (E.7) 

*€R" 

s.t. /(#,£) < 0. 

Proposition E.6. Assume that p = p on a neighborhood of Y. Then, the 
Lagrange multiplier sets of the problems (E.l) and (E.7) coincide at any ref- 
erence point (r},£) € Z. 

Proof. In order to study the Lagrange multipliers associated with the explicit 
constraints in (E.7), we have to consider the usual family of perturbed maxi- 
mization problems 



<2>'(d in ,cf q ,77,£) = sup p{x,r},£) 
s.t. f ln (x, £) < d m 

/«*(*,€) = d 8q - 

By corollary E.2 we may conclude that 3’ = ? on a neighborhood of Z' . 
Notice that both 3' and 3' are concave in (x,d m ,d eq ) on the entire space. 
Thus, we find 



d d in X dd ° q 3' = d d .n X <9 d eq 3' 



on Z ' , and the claim follows. 



□ 
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